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PROTEST UNDER 37 CRR, § 1.291(a) 

This paper is submitted under 37 C.F.R. § 1 .291 (a) in protest of the possible 
issuance of a U.S. patent based on application no. 08/434,105 Hhe '105 
application"), filed May 3, 1995, in the names of David A. Fischhoff and Frederick J. 
Perlak (collectively "Fischhoff").^ The '105 application is assigned to Monsanto 
Technology LLC ("Monsanto"). It is being returned to post-interference, ex parte 
prosecution following entry of Consent Judgment in Mycogen Corp. v, Monsanto Co., 
Civ. Action No. 1:04-CV-0573 DFH-WTL (SD Ind.) (attached as Exh. B), an action 
brought by Mycogen after the USPTO Board of Patent Appeals and Interferences 
awarded Fischhoff priority of invention over Mycogen's inventors in Interference No. 
103,781 ("the interference"). The Board's decision issued on February 2, 2004. 



^ A diagram showing the relationships of the applications and patents in this family is 



attached as Exh. A. 
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Protestor believes the *105 application includes at least one clainn to a synthetic 
gene derived from a Bacillus thuringiensis insecticidal protein toxin gene. On return to 
ex parte examination, the Examiner should not pass claims of the *105 application 
directed to a synthetic gene to issue, but instead should reject such claims on the 
judicially-created ground of obviousness-type double patenting. As explained below, on 
information and belief concerning the presence of synthetic gene claim 40 in the *105 
application, Protestor submits that the subject matter sought to be patented is merely an 
obvious variation of subject matter already patented by Fischhoff in U.S. Patent Nos. 
5,500,365 ("the *365 patent"), issued March 19, 1996 (attached as Exh. C). and 
5,880,275 ("the '275 patent"), issued March 9, 1999 (attached as Exh. D). Under the 
principles of obviousness-type double patenting, such claims are not permitted without a 
terminal disclaimer. Moreover, double patenting is not an issue that appears to have 
been raised in the interference. Accordingly, for the following reasons, the Examiner 
should reject any gene claims in the *105 application. 
I. Compliance With 37 C.F.R. S 1.291 

This Protest complies with the requirements of 37 C.F.R. § 1 .291 . First, under 
§ 1 .291(a), a protest must adequately identify the application being protested so that it 
can be matched to the application. As indicated above, this Protest is against U.S. 
application no. 08/434,105, entitled "Synthetic Plant Genes And Method For 
Preparation." filed by Fischhoff on May 3, 1995, and which, based on information 
available on PAIR, apparently will be returned to Group Art Unit 1638 and Examiner 
Kubelik. Public PAIR also indicates Confirmation No. 2627 for this application. Thus, 
the Office should be able to readily match the Protest to the application. Jurisdiction 



over the application may initially pass through the Board. If so, Protestor has identified 
Interference No. 103.781, again making it possible to match the Protest to the 
application. 

Second, a protest may be filed in an application in accordance with § 1.291(b) 
prior to publication pursuant to § 1 .21 1 or mailing of a Notice of Allowance. To 
Protestor's knowledge, neither of those actions have occurred in the *105 application as 
of the filing of this Protest. 

Third, in accordance with § 1.292(b)(2), this Protest is the first and only protest 
filed against the *105 application on behalf of the real party in interest submitting it. 

Fourth, a certificate indicating service of this Protest on FischhofTs attorney or 
agent is attached, in compliance with § 1 .291(b). 

Fifth, in compliance with § 1 .291(c), the patents relied on by Protestor are listed 
on page 2. a concise explanation of the relevance of the patents is provided below, and 
a copy of each patent is included with this Protest as Exhs. C and D. 

Accordingly, it is submitted that all of the requirements for a protest are satisfied. 
The Examiner should therefore accept this Protest and should apply the double 
patenting ground of rejection it raises. 



II. Claim 40 Is Unpatentable Under the 

Doctrine of Obviousness-Type Double Patenting 

On information and belief, there are four claims of the *105 application involved in 
the interference. Claims 3. 5, and 39 are method claims. Claim 40 is directed to a 
synthetic gene: 

40. A synthetic gene which is derived from a Bacillus thuringiensis 
insecticidal protein toxin gene and which is more highly expressed in 
plants, wherein the coding sequence of said synthetic gene is modified to 
contain: 

a) a greater number of codons preferred by the intended plant host 
than said insecticidal protein toxin gene; and 

b) fewer polyadenylation signal sequences than said insecticidal 
protein toxin gene. 

The claims of the *365 and '275 patents are also directed to genes encoding 
modified insecticidal toxin proteins. As the Protester will explain, claims in those two 
patents are directed to subject matter that falls within the broader scope of * 105 
application claim 40. In other words, a synthetic gene of claim 40 is clearly 
unpatentable over one or more issued claims in the patents. It is an "obvious variant" 
within the meaning of obviousness-type double patenting. Granting such a claim as 
claim 40 would effectively extend the term of the '365 and *275 patents. Therefore, 
claim 40, and any other such gene claims, are subject to rejection on the non-statutory 
ground of double patenting. 



A- Claim 40 Is Unpatentable Over '365 Patent Claim 5 

Claim 5 of the *365 patent,^ which depends from independent claim 4, Is directed 
to a modified chimeric gene. The gene comprises three components: (1) a promoter; 
(2) a structural coding sequence; and (3) a 3' nontranslated region comprising a 
polyadenylation signal. The coding sequence is required to have certain characteristics: 
(1) a specific sequence of 37 nucleotides found in the naturally occurring B. 
thuringiensis sequence; (2) at least one fewer plant polyadenylation sequence (or 
ATTTA sequence) compared to the naturally occurring sequence; and (3) an increased 
number of plant preferred codons compared to the naturally occurring sequence. The 
chart attached as Exh. E compares the language of *105 application claim 40 and *365 
patent claim 5, with claim 40 rearranged to track the language in claim 5. It is evident 
from this comparison that *105 application claim 40 defines a genus which 
encompasses the subgenus recited by '365 patent claim 5.^ Claim 40 is therefore 
patentably indistinct from claim 5. Under the test of non-statutory (obviousness-type) 
double patenting, "the examiner asks whether the application claims are obvious over 
the patent claims." In re Berg, 140 F.3d 1428, 1432 (Fed. Cir. 1998) ("one-way test"). 

^ The *365 patent issued on March 19, 1996, from an application filed October 9. 1992. 
Hence, the term of the '365 patent is 17 years from the date of issue. It will expire on 
March 19, 2013. 

^ In the interference, Monsanto apparently characterized claim 40 as defining a genus 
of genetic sequences. See Board Opinion and Final Order dated February 2, 2004, 
page 36. 
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Applying that test, it is clear that an earlier patented species or subgenus at least 
renders obvious (in fact, anticipates) a genus claim which encompasses them. 
Accordingly, the Examiner should reject claim 40. and any patentably indistinct gene 
claims, for obviousness-type double patenting over *365 patent claim 5. 

In view of the prosecution history, the one-way test is the appropriate way to 
determine whether there is double patenting. Monsanto initially pursued broad claims in 
the application that issued as the *365 patent, for example, application claim 45 entered 
by the Preliminary Amendment dated January 1 1 . 1993. Ultimately, Monsanto's claims, 
including claim 45. were rejected as obvious over a combination of art including U.S. 
Patent No. 5.380,781 to Adang et al^ Office Action mailed October 4, 1995 (Paper No. 
36, p. 4). In reply, Monsanto cancelled or amended its claims to distinguish over the 
references, choosing to obtain an early allowance of narrowed claims and pursuing 
broader claims, such as claim 40. in the *105 application. Having chosen to obtain an 
earlier patent on narrower claims, Monsanto had to accept that any broader, 
encompassing claims sought later would be subject to double patenting to prevent 
effectively extending the term of the earlier claims. 

In some narrow circumstances, a "two-way test" is applied as an exception to the 
one-way test. But this two-way test only applies when an applicant could not avoid 
filing claims in separate applications, and even then, only if the Office controlled the 
rates of prosecution so as to cause later-filed species claims to issue before claims in 
an earlier application to a genus. Berg, 140 F.3d at 1434. That is not the situation here. 



This patent was also involved in the interference. 
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As in the case of Berg, where the court applied the one-way test, the facts here 
are also similar to those of In re Goodman, 11 F.3d 1046 (Fed. Cir. 1993). In Goodman, 
the court affirmed the Board's judgment that claims under examination directed to a 
method for producing a mammalian peptide in plant cells were obvious variants of 
narrower patented claims directed to methods of producing interferon, a mammalian 
protein, in dicotyledonous plant cells. Before reaching the substantive issue, the court 
considered whether it needed to apply the two-way test under the facts of the case. 

Specifically, Goodman's broad claims had been rejected for lack of enablement. 
After failing to persuade the examiner that the specification enabled the broad claims, 
Goodman amended the claims to narrow them to a scope that in the examiner's view 
was enabled. The narrowed claims issued in U.S. Patent No. 4,956,282 ("the '282 
patent"). Goodman also filed a continuation application, the *380 application, to purse 
the broader claims. The examiner again rejected the broad claims for lack of 
enablement on the ground that the claimed methods could not be performed in all plant 
cells without undue experimentation. The *380 application claims were also rejected for 
obviousness-type double patenting. The Board upheld both rejections and Goodman 
appealed to the court. 

The court held that the one-way test applied because PTO actions did not dictate 

the rate of prosecution of the species and genus claims. Instead, Goodman chose to 

file a continuation directed to the broader subject matter, while seeking earlier issuance 

of the narrow species claims. As the court observed: 

Appellant's position [that a terminal disclaimer is unwarranted] 
could extend the term of the patent grant for many cases in a similar 
posture. By adopting the easy course of filing a continuation or divisional 



application to gain a narrow claim, a patentee could gain an extension of 
term on a species when the broad genus later issued. . . . 

Claim 12 and 13 are generic to the species of invention covered by 
claim 3 of the patent. Thus, the generic invention is "anticipated" by the 
species of the patented invention. This court's predecessor has held that, 
without a terminal disclaimer, the species claims preclude issuance of the 
generic application. 

Goodman, 11 F.3d at 1053 (citations omitted). 

Thus, where an applicant seeks an early allowance of narrow claims and also 
chooses to pursue the broader, rejected claims in a continuation application the one- 
way test applies in determining whether the broader claims are unpatentable as obvious 
variants of (or anticipated by) the patented narrow claims. Here, Monsanto made the 
decision to narrow its claims and seek broader claims in a continuation application, just 
like Goodman. Moreover, Monsanto made that decision before the interference was 
declared and. therefore, the fact that issuance of the broader subject matter of '105 
patent claim 40 was later delayed by the interference has no bearing on the double 
patenting rejection. 

Returning to Exh. E and the chart comparing *105 application claim 40 and '365 
patent claim 5, under the one-way test, it is evident that the modified chimeric gene of 
*365 patent claim 5 is a subgenus of the genus of synthetic genes recited in '105 
application claim 40. Claim 40, therefore, is an obvious variant of claim 5, and 
unpatentable for obviousness-type double patenting. If claim 40 were permitted to 
issue. Monsanto could assert that claim against a modified chimeric gene falling with 
the scope of '365 patent claim 5. In other words, granting claim 40 would provide 
Monsanto with an extension on patent term for the subject matter of '365 patent claim 5. 



As the Goodman court noted, this is precisely the occurrence that the doctrine of 
obviousness-type double patenting doctrine is intended to prevent. 

For these reasons, '105 application claim 40 is properly rejected under the 
doctrine of obviousness-type double patenting over at least claim 5 of the '365 patent. 

B- Claim 40 Is Unpatentable Over Claim 3 of the '275 Patent 

There are eight independent species claims in the *275 patent.^ It is sufficient for 
the Examiner to make a double patenting rejection of claim 40 if that claim is patentably 
indistinct from (anticipated by or obvious from) at least one claim of the '275 patent. As 
explained, '275 patent claim 3 is directed to a species within the scope of '105 
application claim 40. Thus, claim 40 is properly rejected for obvious-type double 
patenting over at least that issued claim. While the Protester has focused on claim 3, 
the Examiner can similarly consider other claims of the '275 patent as well. 

Claim 3 recites "[a] heterologous gene construct comprising a structural coding 
sequence which encodes an insecticidal protein derived from B.tk. HD-1, said structural 
coding sequence comprising SEQ ID NO:22." The identity of SEQ' ID NO: 22 is 
explained in the *275 specification. Example 2 of the *275 patent, which begins on col. 
19, describes a fully synthetic B.tk. HD-1 gene. As described, the gene "was designed 
using the preferred plant codons listed in Table V below." Col. 19. lines 59-60. 
Comparison of the last two columns in Table V reveals that the fully synthetic gene 



^ The '275 patent issued from an application filed April 29, 1997, which claims benefit of 
application no. 07/315,355 filed February 24, 1989. Thus, the '275 patent expires 20 
years from that earliest claimed date of U.S. benefit, that is, February 24, 2009. 
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contains a greater number of plant preferred codons than the wild-type gene. In 
addition, according to the *265 patent at col. 16, lines 5-10, the wild-type HD-1 gene 
contains 18 potential polyadenylation sites. In describing the synthetic gene, Monsanto 
states: "The resulting synthetic gene lacks ATTTA sequences, contains only one 
potential polyadenylation site and has a G+C content of 48.5%. Fig. 3 is a comparison 
of the wild-type HD-1 sequence to the synthetic gene sequence for amino acids 1-615." 
Col. 20, lines 60-64. According to the "Brief Description Of The Drawings," "FIGS. 3A- 
3C illustrate a comparison of the changes in the synthetic B,tk. HD-1 sequence of 
Example 2 (lower line) (SEQ ID NO:22) versus the wild-type sequence of B.t.k. HD-1 
which encodes the crystal protein toxin (upper line)." Hence, the fully synthetic gene 
sequence described in Example 2 is the same as SEQ ID NO:22. 

Claim 3, therefore, is a species claim directed to a synthetic gene derived from a 
B. thuringiensis insecticidal protein toxin gene that has been modified to contain a 
greater number of plant preferred codons than the wild-type insecticidal protein toxin 
gene and fewer polyadenylation signal sequences that the wild-type toxin gene. As 
explained in Example 4, Table VIII (col. 24). tobacco plants containing the fully synthetic 
B.tk. HD-1 gene demonstrated a 500-fold increase in expression compared to the wild- 
type gene. Claim 3, therefore, is a species of the genus recited by *105 application 
claim 40. 

From the analysis above, it is evident that *105 application claim 40 is an obvious 
variant of at least *275 patent claim 3. Accordingly, the Examiner should reject claim 40 
for obviousness-type double patenting. 



III. 35 U.S.C. § 121 Does Not Shield '105 Application 
Claim 40 from Unpatentability Under the Doctrine 
of Obviousness-Type Double Patenting 

In situations where a patent application claims two or more independent and 
distinct inventions, 35 U.S.C. § 121 authorizes the Director to require the applicant to 
restrict the claims to one of the inventions. Section 121 also shields an applicant who, 
in a divisional application, pursues claims to an invention restricted out of a parent 
application by prohibiting the use of parent and divisional applications as references 
against each other, as long as the divisional application is filed before the issuance of 
the parent application.^ The protections of Section 121 , however, only apply when the 
applicant maintains consonance with the restriction requirement. In other words, 
"[s]ection 121 shield claims against a double patenting challenge if consonance exists 
between the divided groups of claims and an earlier restriction requirement." Geneva 
Pharm., Inc. v. GlaxoSmithKline PLC, 349 F.3d 1373, 1381 (Fed. Cir. 2003), citing 
Symbol Techs., Inc. v, Opticon, Inc., 935 F.2d 1569, 1579 (Fed. Cir. 1991). 



The third sentence of Section 121 states: "A patent issuing on an application with 
respect to which a requirement for restriction under this section has been made, or on 
an application filed as a result of such a requirement, shall not be used as a reference 
either in the Patent and Trademark Office or in the courts against a divisional 
application or against the original application or any patent issued on either of them, if 
the divisional application is filed before the issuance of the patent on the other 
application. 
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When an applicant includes new or amended clainns in a divisional application 
that cross over the boundaries set by the restriction requirement, thereby claiming the 
invention that was elected in the parent case, the benefits of Section 121 no longer 
apply. For example, if gene claims were restricted from method claims and the gene 
claims elected and patented, the applicant cannot later claim patentably indistinct genes 
and avoid a double patenting rejection. 

That is the situation here. The examiner imposed a restriction requirement 
during prosecution. Monsanto has not maintained consonance with that restriction 
requirement. Accordingly, the protections of Section 121 do not apply. 

A- The Relevant Prosecution History 
1. Application No. 07/476,661 

Application no. 07/476,661 ("the *661 application") was filed on February 12, 

1990. There were 38 claims in the *661 application as filed. Claims 1-12 and 27 were 

drawn to methods. Claims 13-26 and 28-38 were drawn to compositions, for example, 

structural genes, vectors, and transformed plant cells. The following claims are 

representative of the composition claims: 

13. A structural gene which encodes an insecticidal protein of 
Bacillus thuringiensis, said gene being substantially devoid of 
polyadenylation signals and ATTTA sequences. 

28. A structural gene sequence of Claim 13 comprising a majority 
of plant preferred codons. 

Claims 3 and 4 are representative of the method claims: 

3. A method for modifying a wild-type structural gene sequence 
which encodes an insecticidal protein of Bacillus thuringiensis to enhance 
the expression of said protein in plants which comprises: 

a) removing polyadenyalyion signals contained in said-wild type 



gene while retaining a sequence which encodes said protein; 

and 

b) removing ATTTA sequences contained in said wild-type 

gene while retaining a sequence which encodes said protein. 

4. A method of claim 3 further comprising the removal of self- 
complementary sequences and replacement of such sequences with 
nonself-complementary DNA comprising plant preferred codons while 
retaining a structural gene sequence encoding said protein. 

The Office imposed a restriction requirement in the Office Action mailed October 

1, 1991 (Paper No. 2). According to the Office, the claimed methods and compositions 

were directed to separate and distinct inventions: 

Restriction to one of the following inventions is required under 35 
U.S.C. § 121: 

I. Claims 1-12. and 27, drawn to a method for improving 
transformed gene[s] in plants, classified in Class 435, subclass 172.1. 

II. Claims 13-26, and 28-38, drawn to a modified BA^ toxin 
structural gene, DNA per se, and transformed plant cells, etc., classified in 
Class 435, subclass 240.4. 

Paper No. 2, p. 2. The Office explained its reasoning as to why the two groups of 

claims were separate and distinct. It also indicated that there had been a request for 

oral election, but no election was made by Monsanto in response to the request. Id., p. 

3. 

Monsanto filed a one page response to the restriction requirement on October 
29, 1991, "electing] with traverse the Invention of Group II, Claims 13-26 and 28-38 for 
examination purposes." Monsanto did not provide any argument that the restriction 
between Groups I and II was improper and should be withdrawn. Instead, Monsanto 
only argued that the examiner should include claims 9-12 in Group II because "[t]he 



reasons for restriction set forth in the Office Letter do not apply to Claims 9-12 because 
these claims are limited to methods for making Bt toxin genes of Claim 13." 

The restriction requirement was made final in the Office Action mailed February 
3, 1992 (Paper No. 6). The examiner declined to include claims 9-12 in Group II. Paper 
No. 6. p. 2. Thus, method claims 1-12 and 27 were withdrawn from consideration as 
directed to a nonelected invention. There is no indication in the prosecution history of 
the *661 application that Monsanto filed a petition seeking review of the restriction 
requirement. 

Prosecution proceeded on composition claims 13-26 and 28-38. Those claims 
were rejected. Ultimately, Monsanto abandoned the '661 application in favor of a 
continuation. 

2- Application No. 07/959.506 (U.S. Patent No. 5,500,365) 

That continuation, application no. 07/959,506 ("the *506 application"), was filed 
on October 9, 1992. The *506 application was filed under 37 C.F.R. § 1 .62 as a file 
wrapper continuation application. Therefore, prosecution of the elected Group II claims 
continued in the '506 application. There was no restriction requirement imposed by the 
examiner during prosecution of the *506 application. However. Monsanto never 
introduced a method claim during prosecution of the *506 application, thus maintaining 
consonance with the parent *661 application restriction requirement. The various 
composition claims in the application were examined to allowance. Those claims 
issued in the '365 patent on March 19. 1996. 
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3. Application No. 08/433,111 

Application no. 08/433.111 ("the '111 application") was filed on May 3. 1995, as a 
"divisional" of the *506 application. However, the original 38 claims, that is. the same 
set of claims that were restricted in the *661 application, were examined by an examiner 
who had not been responsible for any of the prior applications in this family. At no time 
did Monsanto bring to the attention of the examiner the restriction requirement entered 
in the *661 application, and consonance was not maintained with the restriction 
requirement in the *661 application. This application was eventually abandoned for 
failure to respond to an Office Action. Before that occurred, on December 13, 1996, 
Monsanto filed a "Notice of the Declaration of an Interference Involving a Related 
Application," wherein it stated; "This is to advise the examiner that interference no. 
103,781 has been declared involving application No. 08/434,105. Application serial No. 
08/434,105 was filed May 3, 1995 and is a continuation of serial No. 07/959,506, filed 
October 09, 1992, which is the parent of this application." 

4. Application No, 08/841.178 (U.S. Patent No. 5.880, 275) 
Monsanto filed application no. 08/841.178 ("the '178 application") on April. 27. 

1997. as a continuation of the '111 application. With the application, Monsanto filed a 
Preliminary Amendment canceling original claims 1-38 and entering claims 39-43. All of 
those claims were directed to chimeric genes. Section 121 is not applicable to gene 
claims since gene claims were elected in response to the restriction requirement in the 
'661 application. Consonance would be maintained, and section 121 would only afford 
protection against a double patenting rejection, for method claims, which is the claimed 
subject matter that had been restricted and not elected. 



Monsanto then filed a Second Preliminary Amendment on July 28, 1998, 
canceling claims 39-43 and entering claims 44-51 directed to species of genes 
encoding insecticidal B. thuringiensis proteins. Those claims were replaced with claim 
52-59, submitted in a Third Preliminary Amendment filed on September 8, 1998. which 
were directed to heterologous genes comprising specific SEQ ID NOs. 

At no time did Monsanto pursue method claims in the *178 application. 
Monsanto received a first action allowance and application claims 52-59 issued as 
claims 1-8 in the '275 patent. 

B- Monsanto Has Not Maintained Consonance 

During prosecution of the *661 application the examiner imposed a restriction 
requirement providing a clear demarcation between the restricted subject matter. See 
Geneva Pharm., 349 F.3d at 1381 ("[R]estriction requirements must provide a clear 
demarcation between restricted subject matter to allow determination that claims in 
continuing applications are consonant and therefore deserving of S 121's protections."). 
That clear demarcation divided the claims into two well-defined groups: (1 ) methods for 
improving transformed gene[s] in plants (claims 1-12, and 27); and (2) modified 
toxin structural genes, DMAs per se, and transformed plant cells, etc. (claims 13-26, and 
28-38). In response to the restriction requirement, Monsanto elected to prosecute the 
composition claims. By prosecuting a composition claim, that is, claim 40, in the *105 
application Monsanto has crossed over the clear line of demarcation set by the 
examiner. 

The mere fact that claim 40 is directed to a composition of matter is sufficient for 
the Office to conclude that in the '105 application Monsanto has not maintained 
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consonance with the restriction requirement. Yet a comparison of *105 application claim 
40 and *661 application claim 28 (rewritten in independent form and restructured to 
match the limitations in claim 40) reveals beyond doubt that claim 40 is directed to 
essentially the same subject matter that Monsanto elected and prosecuted in the '661 



application: 



*105 Application Claim 40 


'661 Application Claim 28 


40. A synthetic gene which is derived 
from a Bacillus thuringiensis insecticidal 
protein toxin gene and which is more 
highly expressed in plants, wherein the 
coding sequence of said synthetic gene is 
modified to contain: 

a) a greater number of codons 
preferred by the intended plant host than 
said insecticidal protein toxin gene; and 

b) fewer polyadenylation signal 
sequences than said insecticidal protein 
toxin gene. 


28. A structural gene which encodes an 
insecticidal protein of Bacillus 
thuringiensis, comprising 

a majority of plant preferred codons 

said gene being substantially devoid of 
polyadenylation signals and ATTTA 
sequences 
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Because of Monsanto's failure to maintain consonance with the restriction requirement, 
claim 40 is not shielded against a double patenting challenge. In view of the above, the 
Protester respectfully requests that the Examiner reject claim 40 as unpatentable for 
obviousness-type double patenting over at least claim 5 of the '365 patent and claim 3 
of the '275 patent. 

Respectfully submitted, 



Dated: March 23, 2006 



By:_ 





Jozsef Croristophe/ G^AAL 

Syngenta Uimited 

Jealotfs Hill Intern^tio/ial Research Centre, 
Bracknell, Berks ^042 6EY 
UNITED KINGDO 
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CERTIFICATE OF SERVICE 

I certify that, in accordance with 37 C.F.R. § 1 .248, copies of the foregoing 
PROTEST UNDER 37 C.F.R. § 1.291(a), and Exhibits A-E cited therein, were 
served on the Applicants through their attorney of record on this the 5^^ day of April, 
2006, as follows: 



Via First Class Mail and FedEx to: 

Lawrence M. Lavin, Jr., Esq. 
Monsanto Company 
800 N. Lindbergh Blvd. NZNB 
St. Louis, Missouri 63167 
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EXHIBIT B 
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"MYCOGEN CORPORATION and 
MYCOGEN PLANT SCIENCE, INC., 



IN THE UNITED STATES DISTRICT COURT 
FOR THE SOUTHERN DISTRICT OF INDIANA 
INDIANAPOLIS DIVISION 



Plaintiffs, 



V. 



CIVIL ACTION NO. 
l:04-CV-0573 DFH-WTL 



MONSANTO COMPANY and 
MONSANTO TECHNOLOGY, LLC, 



Defendants. 



CONSENT JUDGMENT AND ORDER 



Plaintiffs, Mycogen Corporation and Mycogen Plant Science, Inc. ("Mycogen"), and 
Defendants, Monsanto Company and Monsanto Technology, LLC ("Monsanto"), having agreed 
to a settlement of this action and having consented to entry of this Judgment, 

IT IS HEREBY ORDERED, ADJUDGED AND DECREED that: 



2. This 35 U.S.C. § 146 action arose from the January 29, 2004 Final Decision of the 
U.S. Patent and Trademark Office Board of Patent Appeals and Interferences in Interference No. 
103,781 (the "'781 Interference"). 

3. The January 29, 2004 Final Decision of the U.S. Patent and Trademark Office 
Board of Patent Appeals and Interferences entering judgment as to the subject matter of "new 
Count 2," the sole Count in the '781 Interference (the "Count"), is hereby AFFIRMED. The 
U.S. Patent and Trademark Office is hereby ORDERED to enter judgment as to the subject 
matter of the Count in favor of David A. Fischhoff and Frederick J. Perlak and to enter judgment 



1. 



This Court has jxxrisdiction over the Parties and subject matter of this action. 
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that Adang, et al. are not entitled to a patent containing claims 1-12 of U.S. Patent No. 
5,380,831, issued January 10, 1995. 

4. All costs and attorneys' fees incurred in this action shall be borne by the 
respective parties. 

IT IS SO ORDERED. 

Entered: February 24 ,2006. 

DAVID F. HAMILTON, JUDGE 
United States District Court 
Southern District of Indiana 

CONSENTED AND AGREED TO: 



si Donald E. Knebel 

Donald E. Knebel 

BARNES & THOIWBURG LLP 

1 1 S. Meridian Street 

Indianapolis, IN 46204 

(317) 236-1313 

Robert Isackson 

ORRICK, HERRINGTON & SUTCLIFFE 
666 Fifth Avenue 
New York, NY 10103 
(212)506-5000 

Attorneys for Plaintiffs, Mycogen Corporation 
and Mycogen Plant Science, Inc. 
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s/ James M. Hinshaw 

James M. Hinshaw 
BINGHAM McHALE LLP 
2700 Market Street 
Indianapolis, IN 46204-4900 
(317) 635-8900 
(317) 968-5385 

Susan K. Knoll 
HOWREY LLP 
11 1 1 Louisiana, Suite 2500 
Houston, TX 77002 
(713) 787-1400 

Attorneys for Defendant, Monsanto 
Company and Monsanto Technology, LLC 

1030272 
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1 ATGGCTATAGAAACTGGTTACACCCCAATCGATATTTCCT 4 0 

4 1 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 80 



8 1 TGCTGG ATTTGTGTTAGG ACT AGTTGATATAATATGGGGA 120 

T C 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 



161 TTGAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAG 200 
C C C G C G 

201 GAACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 24 0 
T 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 280 

281 ATCCTACT AATCC AGC ATT AAG AG AAGAG ATGCGTATTCA 320 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 3 60 

361 CTTTTTGC AGTTC AAAATT ATCAAGTTCCTCTTTTATC AG 4 00 

CC C C 

4 01 TATATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAG 4 4 0 
G C C CC C .CC C 

4 41 AGATGTTTCAGTGTTTGG ACAAAGGTGGGG ATTTGATGCC 4 80 

4 81 GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 520 
521 TTGGCAACTATACAGATC ATGCTGTACGCTGGTACAATAC 5 60 

5 61 GGGATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGG 600 

* ■ . ■ • 

601 ATAAGATATAATCAATTTAGAAGAGAATTAACACTAACTG 64 0 
CGCCGC GCT 

641 TATTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAG 680 

681 AACGTATCCAATTCGAAC AGTTTCCG AATT AACAAG AG AA 720 

FIG. 2 A 
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721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 7 60 

• • • • 

7 61 TTCGAGGCTCGGCTC AGGGC ATAG AAGGAAGT ATTAGGAG 800 

801 TCC AC ATTTG ATGG AT AT ACTT AATAGT AT AACC AT CT AT 84 0 

841 ACGGATGCTCAT AGAGGAGAATATTATTGGTCAGGGCATC 880 

C C C T C 

881 AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAG AATT 920 
G C 

* • • . 

921 C ACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 960 

961 CAAC7VACGTATTGTTGCTC AACTAGGTCAGGGCGTGTATA 1 0 00 

1001 G AACATTATCGTCC ACCTTATATAGAAG AC CTTTTAATAT 104 0 

C 

1041 AGGGAT AAATAATCAACAACTATCTGTTCTTG ACGGG AC A 1080 
C C C C 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 

1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 

1201 AGTCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 124 0 

1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 

1281 CTCTTGGATACATCGTAGTGCTGAATTTAATAATATAATT 1320 

G C C C C C 

1321 CCTTCATCACAAATTACACAAATACCTTTAACAAAATCTA 1360 

C C C AC C C G 

1361 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 



FIG. 2B 
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14 01 ATTTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGC 14 40 

14 41 CAGATTTCAACCTTAAGAGTAAATATTACTGCACCATTAT 14 80 

14 81 CACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 

1521 AAATTTACAATTCCATACATCAATTGACGGAAGACCTATT 15 60 
CC T G C 

15 61 AATCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTA 1600 

• • • • 

1601 ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1640 

1641 TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 1680 

• • - . 

1681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAG 1720 
1721 ATCGAATTGAATTTGTTCCGGCA 174 3 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 4 0 
CCA C AC 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

8 1 AAGAATAG AAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

12 1 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 1 60 
CTGAG GCCCGCGA 

• • • • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 
GCTCC CC.C T 

201 TTTTGGTCCCTCTC AATGGGACGCATTTCTTGTACAAATT 24 0 
C A T C G G 

• • • • 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G G C G G C . G C 

• • • • 

2 81 ACC AAGCC ATTTCT AGATTAGAAGGACTAAGC AATCTTTA 320 

G C G G T G C 

• • • . 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 3 60 
C C T GAGC C C 

• • • . 

3 61 CCTACTAATCCAGCATTAAG AGAAGAGATGCGTATTCAAT 4 00 

C TC CC C G A 

• • • . 

4 01 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 4 4 0 

C CTGCA CAT 

• • ■ • 

4 4 1 TTTTGCAGTTCAAAATTATC AAGTTCCTCTTTTATCAGTA 4 80 
GC CGCC CGCG 

• • - * 

4 81 TATGTTC AAGCTGC AAATTT ACATTT ATCAGTTTTG AG AG 52 0 

C A T C T CC CAGC GC TC 

• • • . 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 5 60 
C AGC . G C T 

• • • . 

5 61 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 

AC C CCCCT G 

• • • . 

60 1 GGCAACTATACAGATcATGCTGTaCGCTGGTACAATACGG 64 0 
A CCCCC TT CT 

64 1 GATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGGAT 68 0 
C G C T T 



FIG. 3A 
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681 AAGATATAATCAATTTAGAAG AGAATTAACACT7LACTGTA 
T CCGCG GCC AT 

721 TTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAGAA 
G C T G C C CTCC 

7 61 CGTATCCAATTCGAAC AGTTTCCC AATT AAC AAGAGAAAT 
CCTCT G CTC 

• * ■ ■ 
80 1 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 

C T TCTGCCC CC 

• • • • 
841 CGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAGTC 

T T T C A T C CTCC C C 

• • • • 
881 C AC ATTTG ATGG AT AT ACTT AAT AGT AT AACC ATCT AT AC 

C CCTGCC T C 

921 GGATGCTCATAGAGGAGAATATTATTGGTCAGGGCATC7UV 
C C GCTACG 

• • • • 
961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 

C C ATA CAGC C G T 

r • • • 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 
CTC C C 

• . . . 
1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 

C T C C 

• . . . 
1081 AC ATT ATC GTCC AC CTT AT AT AG AAG ACCTTTT AATAT AG 

CGT GC CC C 

■ ■ • • 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 
TCCCG TC A 

1161 ATTTGCTT ATGGAACCTCCTCAAATTTGCCATCCGCTGTA 
G C C T T C T 

• • . . 
1201 TACAGAAAAAGCGG AACGGT AG ATTCGCTGG ATGAAAT AC 

G C T CT C C 

1241 CGCC ACAG AATAACAACGTGCCACCTAGGCAAGGATTTAG 
A C T C CTC 

1281 TC ATCGATTAAGCC ATGTTTCAATGTTTCGTTC AGGCTTT 
C CA G .G CGC C CAC 

1321 AGTAATAGTAGTGT AAGTAT AATAAGAGCTCCTATGTTCT 
C C TCC G C C C 

• • • • 
1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTCC 

AT G C C C 
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14 01 TTCATCACAAATT ACACAAATACCTTT AACAAAATCTACT 1440 
CT CC CAGCG 

• • • • 

14 41 AATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGGAT 1480 
C A G C 

• > • • 

14 81 TTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGCCA 1520 
C T A T 

• • • • 

1521 GATTTCAACCTTAAGAGTAAATATTACTGCACCATTATCA 1560 
AGC CC TCC CT T 

• • * ■ 

1561 CAAAGATATCGGGTAAGAATTCGCTACGCtTCTACCACAA 1600 

T C G T A A 

• • • • 

1601 ATTTACAATTCCATACATCAATTGACGGAAGACCTATTAA 164 0 
CG CCCC G C 

• • « • 

1641 TCAGGGG AATTTTTC AGCAACTATG AGTAGTGGGAGT AAT 1680 
T C C C C TCA CCCC 

• » • • 

1681 TTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTACTC 1720 
GA C CACC C 

1721 CGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTAAG 17 60 
TC CTC CTCCCT 

• * • • 

17 61 TGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAGAT 1800 
C G T G C T C 

• • . . 

1801 CGAATTGAATTTGTTCCGGCAGAAGTAACCTTTGAGGCAG 184 0 
T G GTC T C T 

1841 AATAT 1845 
G C 



FIG. 3C 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 4 0 
CCA C AC 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G AT C T 

8 1 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 1 60 
CTGAG GCCCGCGA 

161 CTGGATTTGTGTT AGGACTAGTTG AT ATAAT ATGGGG AAT 200 
G C TC C C C C T 

• ■ • * 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 2 4 0 
C A T C G G 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 2 80 
G GC GGC G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 3 60 
C C T GAGC . C C 

3 61 CCTACT AATCC AGC ATTAAG AGAAGAG ATGCGTATTC AAT 4 00 

C TC CC C G A 

4 01 TCAATGACATGAAC AGTGCCCTTACAACCGCTATTCCTCT 4 4 0 

C CTGCA CAT 

4 41 TTTTGCAGTTC AAAATTATCAAGTTCCTCTTTT ATC ACTA 4 80 
GC CGCC CGCG 

" • " • 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 5 20 
C A T C T CC CAGC GC TC 

52 1 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 5 60 
C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

60 1 GGCAACTATACAG ATTATGCTGTACGCTGGTACAATACGG 64 0 
A CCCCC TT CT 

64 1 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
CGG C TT -' A 



FIG. 4 A 
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• • • • 

68 1 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTG.TA 720 
TA CCGCG- GCCAT 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

7 61 GATATCCAATTCG AAC AGTTTCCC AATT AAC AAG AG AA AT 800 
CC CTCT G CTC 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 84 0 
C T TCTGCCC CC 

84 1 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 
T TTCATC G CTCC C C 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 92 0 
C C CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 
C CAAGG C TACG 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCC AGAATTCA 1000 
C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
CTC C C 

104 1 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 
CGT CGC CC C 

1121 GGATT^AATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCGT C A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 124 0 
G C T CT C C 

124 1 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

• • • • 

128 1 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 
CCAGG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTAT AAT AAG AGCTCCTATGTTCT 1360 
C C TCC G C C C 

13 61 CTTGGATACATCGTAGTGCTGAATTT AATAATATAATTGC 14 00 

C G C C C C C 
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1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 144 0 
C 

• • • • 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 14 80 
C C C C C 

14 81 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

♦ ^ • • • 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 



1561 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 
C A GA 

• • • ' • 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 164 0 
G T 

• • • • • 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 

C C T C 

• • . . 

1681 TC ATTAGATAATCTACAATC AAGTGATTTTGGTTATTTTG 17 2 0 
C G C C C C C 

• ♦ • • 

17 21 AAAGTGCC AATGCTTTT AC ATCTTC ATTAGGT AATAT AGT 17 60 

• C C C C 

♦ » 

17 61 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 18 4 0 

C G C 

• • » • 

184 1 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

A TGCG 

1881 GCtGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 
CTGT ACGTCTACA C AGCT G ACTC G CA TG 

1921 G 1921 
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1 GAAAGAATAGAAACTGGTTACACCCCAATCGATATTTCCT 4 0 
ATGGCC T C . T C C C 

« • • • 

41 TGTCGCTAACGCAATTTCTTTTiSAGTGAATTTGTTCCCGG 80 
CT G A G GC C C G C G A 

• • • • 

81 TGCTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGA 120 
GCTCC CCC T 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 
C A T C G G 

• • • • 

161 TTGTU^C AGTTAATTAACC AAAGAATAGAAG AATTCGCTAG 200 
G GC GGC G C 

201 GAACCAAGCC ATTTCTAGATTAG AAGG ACT AAGC AATCTT 24 0 
G C G G T G C 

a • • a 

241 TATC AAATTTACGC AGAATCTTTT AGAGAGTGGGAAGCAG 280 
C C T GAGC C C 

281 ATCCTACTAATCCAGC ATTAAGAGAAGAGATGCGTATTCA 320 

C TC CC C G A 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 360 
C CTGCA CA 

• • • ■ 

3 61 CTTTTTGC AGTTCAAAATTATCAAGTTCCTCTTTTATCAG 400 

TGC CGCC CGC 

• • • • 

401 TATATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAG 4 4 0 
G C A T C T CC CAGC GC TC 

4 41 AGATGTTTCAGTGTTTGG ACAAAGGTGGGG ATTTGATGCC 4 80 

C AGC G C T 

• • • • 

4 81 GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 520 
AC C CCCCT G 

521 TTGGCAACTATACAGATTATGCTGTACGCTGGTACAATAC 560 
A C CCCC T T C 

• • • • 

561 GGGATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGG 600 
T C G G C T T . ; 

• • * • 

601 GTAAGGTATAATCAATTTAGAAGAGAATTAACACTAACTG 64 0 
ATACCGCG GCCA 

• • * • 

641 . TATTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAG 680 
T G C T GT C C CTCC 
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681 AAGATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 
CC C T C T G C T C 

• • » • 

721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 7 60 
C T TCTGCCC C 

7 61 TTCGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAG 800 
CTTTCATC G CTCC C 

801 TCC ACATTTG ATGG AT AT ACTTAAC AGTAT AACC ATCT AT 84 0 
C C CCTG C T C 

• . » • • 

841 ACGGATGCTC AT AGGGGTT ATTATT ATTGGTC AGGGC ATC 880 
C CAAGG C TAG. 

■ • . • • 

881 AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 
G C C ATA CAGC C G 

921 C ACTTTTCCGCT ATATGG AACT ATGGG AAATGC AGCTCC A 960 
T C T C C C 

961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

C TCC 
' • • • ■ 

1001 GAACATTATCGTCCACTTTATATAGAAGACCTTTTAATAT 1040 
CGT CGC CC 

1041 AGGG AT AAATAATC AAC AACT ATCTGTTCTTGACGGG ACA 1080 
CTCCCG TC A 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 
G C C T T C 

1121 TATACAG AAAAAGCGGAACGGTAG ATTCGCTGGATGAAAT 1160 
T G C T CT C 

1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 
C A C T C C 

• • • • 

1201 AGTCATC GATT AAGCC ATGTTTC AATGTTTCGTTC AGGCT 124 0 
TCC CAGG CGC C CA 

• * • • 

1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 
C C C TCC G C C C 

1281 CTCTTGGATAC ATCGTAGTGCTG AATTTAATAAT AT AATT 1320 

C G C C C C C 

1321 GCATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 1360 
C 

13 61 ACTTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGG ATT 14 00 
C C C C 
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14 01 TACTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAAT 144 0 
C ACC CCCC 

1441 AACATTCAGAAT AGAGGGTATATTGAAGTTCCAATTC ACT 1480 



• • • ' • 

1481 TCCCATCGACATCTACCAGATATCGAGTTCGTGTACGGTA 1520 
C A GA 

« . • • • 

1521 TGCTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGT 1560 

G T 

15 61 AATTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTA 1500 

C C . T 

« • • • 

1601 CGTCATTAGATAATCTACAATCAAGTGATTTTGGTTATTT 1640 
CCG C CC C C 

1641 TGAAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATA 1680 

C C C C 

« « • • 

1681 GTAGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAA 1720 
G C T 

• * • • 

1721 TAGACAGATTTGAATTTATTCCAGTTACTGCAACACTCGA 17 60 
C C G C 

1761 GGCTGAA 1767 
G 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 4 0 
CCA C AC 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

• • • • 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

• • • • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 2 00 
GCTCC C CC T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

2 41 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 2 80 
G GC GGC G C 

2 81 ACCAAGCC ATTTCTAGATTAG AAGG ACTAAGCAATCTTT A 320 

G C G G T G C 

• . • ■ • 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 3 60 
C C T GAGC C C 

• • ■ 4 

3 61 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 4 00 

C TC CC C G A 

• • • • 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 
C CTGCACAT 

• • • • 

4 41 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 4 80 

GC CGCC CGCG 

■ • • • 

4 81 TATGTTC7U\GCTGCAAATTTACATTTATC AGTTTTGAGAG 52 0 

C AT C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 5 60 
C AGC G C T 

• • • • 

5 61 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 

AC C CCCCT G 

• • • • 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 64 0 
A CCCCC TT CT 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 
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681 AAGGTATAATC AATTTAGAAGAG AATT AAC ACTAACTGTA 
TACC GCG GCCAT 

• ' • • • 
721 TTAGATATCGTTGCTCTGTTCCCGAATTATG ATAGTAGAA 

G C T GT C C CTCC 

7 61 GATATCCAATTCGAACAGTTTCCC AATTAACAAGAGAAAT 
CC CTCT G CTC 

• • ■ • 
801 TTATAC AAACC C AGT ATT AGAAAATTTTG ATGGT AGTTTT 

C T TCTGCCC CC 

• ■ > • 
841 CGAGGCTCGGCTCAGGGC ATAGAAAG AAGTATTAGGAGTC 

TTTCATC G CTCC C C 

881 CACATTTGATGGATATACTTAAC AGTATAACCATCTATAC 
C C CT G C T C 

• a • • 

921 GGATGCTCATAGGGGTTATTATTATTGG^rCAGGGCATCAA 
C CAAGG C TACG 

• • ■ • 
961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCC AGAATTCA 

C C ATA. CAGC C G T 

• • • • 
1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 

CTC C C 

• • • ■ 
1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 

C ' T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 
CGT CGC CC C 

• • • « 
1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 

TCCCG TC A 

• • • • 
1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 

G C C T T C T 

1201 TACAGAAAAAGCGG AACGGTAGATTCGCTGGATGAAATAC 
G C T CT C C 

• > • • 
1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 

A C T C CTC 

• • ' • • 
1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 

CCAGG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 
C C TCC G C C C 

1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 

C G C C C C C 
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1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 14 40 
C 

• • ■ • 

14 41 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 14 80 
C C C C C 

• • • * 

14 81 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 

A C C C C C C 

• • • • 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 15 60 

15 61 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 

C A GA 

• • • 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 164 0 
G T 

• • • • 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C T C 

1681 TCATTAG ATAATCT ACAATC AAGTG ATTTTGGTTATTTTG 1720 
C G C C C C C 

• * • ■ • 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 17 60 

C C _ C C 

• • • • 

17 61 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

• a m m 

1801 GACAGATTTG A ATTTATTCC AGTT ACTGC AAC ACTCG AGG 184 0 
C G C 

.1841 CTGAAT ATAAT CTGG AAAG AGCGC AG AAGGCGGTG AATGC 1880 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 1960 

• • > • 

1961 CGTATTTATCGGATG AATTTTGTCTGG ATGAAAAGCGAGA 2000 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 204 0 
2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 
2081 ATAGGC AACC AG AACGTGGGTGGGGCGG AAGTAC AGGGAT 2120 
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■ • • • 

2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 

• • • • 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

• • • • 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 224 0 

• ♦ • » 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 232 0 

• * • • 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

• • • « 

2361 TTCAGCCCAAAGTCC AATCGGAAAGTGTGGAGAGC CG AAT 24 00 

2 401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTT AGATT 24 4 0 

• - * * . * 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

2481 TCATTTCTCCTTAGAC ATTGATGTAGGATGTACAGACTT A 2520 

2521 AATG AGG ACCT AGGTGT ATGGGTG ATCTTT AAG ATTAAG A 2560 

• • * ■ 

2561 CGCAAG ATGGGC ACGC AAGACTAGGGAATCTAGAGTTTCT 2 600 

2 601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2 64 0 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2 680 

• • • • 

2 681 TGGAATGGGAAACAAAT ATCGTTTATAAAGAGGCAAAAGA 2720 

« • • • 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 

• • • • 

27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 284 0 

FIG. 90 
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• • • • 

2841 GCTGTCTGTGATTCCGGGTGTC AATGCGGCT ATTTTTGAA 2880 

« • • • 

2881 GAATTAG7VAGGGCGTATTTTC ACTGC ATTCTCCCTATATG 2920 

• • • ' • 

2 921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2 960 

• • « • 

2 961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

• • • • 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 304 0 

3041 GGGAAGCAGAAGTGTC AC AAG AAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

« • • • 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 ATAC AG AC G AAC TG AAG TTT AGC AACTGC GT AG AAG AGG A 3200 

- • • • • 

3201 AAtCTATCCAAATAACACGGTAACGTGTAATG ATTATACT 324 0 

• • • • 

3241 GTAAATC AAGAAGAATACGGAGGTGCGTAC ACTTCTCGTA 3280 

3281 ATCGAGG ATAT AACG AAGCTCCTTCCGTACC AGCTGATTA 3320 

• • • • 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

33 61 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

34 01 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 34 4 0 
34 41 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 34 80 
34 81 GAAACGG AAGG AAC ATTTATCGTGG AC AGCGTGGAATT AC 3520 
3521 TCCTTATGGAGGAA 3534 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
C C A C AC 

• • • • 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

« • • • 

8 1 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

• • • « 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 1 60 
CTGAG GCCCGCGA 

• ' • ♦ • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT . 200 
GCTCC CCC T 

• . • • ■ 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 24 0 
C A T C G G 

• • " • ■ 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC GGC G C 

• . ■ ■ B 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T. G C 

■ ■ • ■ 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

• • • • 

361 CCTACTAATCCAGCATTAAGAG/siAGAGATGCGTATTCAAT 400 
C TC CC C G A 

• • • • 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 44 0 
C CTGCA CAT 

• • • • 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 
GC CGCC CGCG 

• • • • 

4 81 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C A T C T CC CAGC GC TC 

• • • ♦ 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

• • • • 

561 GACTATCAATAGTCGTT ATAATGATTTAACTAGGCTTATT 600 
AC CCCCCT G 

• • • • 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 64 0 
A CCGCC TT CT 
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641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 
C G G C T T A 

681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 
TACCGCG GCCAT 

7 21 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 
G C T GT C C CTCC 

7 61 G ATATCC AATTCGAAC AGTTTCCC AATTAAC AAGAGAAAT 
CCC TCT G CTC 

■ ■ ■ • 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 
C T TCTGCCC CC 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGT ATTAGGAGTC 
TTTCATC G CTCC C C 

881 CACATTTG ATGGAT ATACTTAAC AGTAT AACC ATCTAT AC 
C C CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 
C CAAGG C TACG 

961 ATAATGGCTTCTCCTGT AGGGTTTTCGGGGCCAG AATTC A 
C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 
CTC C C 

1041 ACAACGTATTGTTGCTC AACT AGGTCAGGGCGTGT ATAGA 
C T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 
CGT CGC CC C 

1121 GG ATAAATAATC AAC AACTATCTGTTCTTG ACGGG AC AGA 
TCCCG TC A 

• • • • 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCC ATCCGCTGTA 
G C C T T C T 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 
G C T CT C C 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 
A C T C CTC 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 
CCA GG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTATAAT AAG AGCTCCTATGTTCT 
C C TCC G C C C 
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• • ♦ • 

1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 14 00 

C G C C C C C 

• • • • 

1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 14 4 0 
C 

• . • • • 

14 41 TTTCTTTTTAATGGTTCTGTAATTTC AGG ACC AGG ATTTA 14 80 
C C C C C 

• • . • * 

14 81 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 152 0 
ACC C C C C 

• • * m . 

1521 CATTC AGAATAGAGGGTATATTG AAGTTCCAATTC ACTTC 15 60 

• • • • 

1561 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1 600 
C A GA 

• • • • 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 164 0 
G T 

■ • • • 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1 680 
C C T C 

• • • • 

1681 TCATTAGATAATCTACAATC AAGTGATTTTGGTTATTTTG 172 0 
C G C C C C C 

• • • . 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 1760 

C C C C 

1761 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

• • • • 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 184 0 
C G C 

• • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTG AATGC 1880 

• • . ■ 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

G C C C G C 

• • • . 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 1960 
G C G G 

• » . • 

1961 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 
C CC CAGC G C 

• • • • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 204 0 

• • • . 

204 1 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 
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• ' • . ■ 

2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 

2121 TACCATCCAAGG AGGGGATGACGTATTTAAAGAAAATTAC 
G TC GCGGC 

2161 GTCAC ACTATC AGGTACCTTTGATGAGTGCTATCCAACAT 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 
CCCCGG CGCGG 

224 1 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAAC ATG 
C C G CC C C 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 



24 81 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 

2521 AATGAGG ACCT AGGTGT ATGGGTG ATCTTTAAG ATTAAG A 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 

2641 AAAAG AGCGG AG AAAAAATGG AG AGACAAACGTGAAAAAT 

G G 

2 681 TGGAATGGGAAACAAAT ATCGTTT ATAAAG AGGCAAAAGA 

G C C C C 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 
27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 
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2801 ATAAACGTGTTC ATAGC ATTCG AGAAGCTTATCTGCCTG A 2840 

2841 GCTGTCTGTGATTCCGGGTGTC AATGCGGCTATTTTTG AA 2880 

• • • ■ 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2 920 

C C 

. • • • 

2 921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2 960 
CCCGCCCC 

2 961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

• • • • 

300 1 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 304 0 

• • • • 

3041 GGG AAGC AG AAGTGTC AC AAG AAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

• * • • 

3121 TATGGAGAAGGTTGCGT AACCATTCATGAG ATCGAGAACA 3160 

3161 ATAC AGACGAACTGAAGTTTAGC AACTGCGT AGAAG AGGA 3200 

• • • • 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 32 4 0 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCGAGG ATAT AACGAAGCTCCTTCCGTACC AGCTG ATTA 3320 

• • • • 

3321 TGCGTCAGTCT ATGAAG AAAAATCGTATAC AGATGG ACG A 33 60 

• • • • 

3361 AGAG AG AATCCTTGTG AATTT AAC AGAGGGT ATAGGG ATT 3400 

3401 ACACGCCACTACC AGTTGGTTATGTGACAAAAGAATTAG A 3 4 4 0 

34 41 ATACTTCCC AG AAACCG ATAAGGTATGGATTGAGATTGG A 3 480 

34 81 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3 52 0 

3521 TCCTTATGGAGGAA 3534 
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1 ATGGATAACAATCCGAACATC AATGAATGCATTCCTTATA 4 0 
C C A C A C 

• • • • 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

8 1 AAGAATAGAAACTGGTTAC ACCCC AATCG AT ATTTC CTTG 120 
C C T C T C C C 

121 TCGCTAACGC AATTTCTTTTGAGTG AATTTGTTCCCGGTG 160 
CT G A G GC C C G C G A 

161 CTGGATTTGTGTTAGG ACTAGTTG AT ATAATATGGGGAAT 200 
GCTCC CCC T 

• • » • 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

2 41 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC GGC G C 

2 81 ACCAAGCC ATTTCTAGATTAG AAGGACTAAGCAATCTTTA 320 

G C G G T G C 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 3 60 
e C T GAGC C C 

3 61 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 4 00 

C TC CC C G A 

4 01 TCAATGACATGAACAGTGCCCTTAC AACCGCTATTCCTCT 4 4 0 

C CTGCA CAT 

4 41 TTTTGC AGTTC AAAATTATCAAGTTCCTCTTTTATC ACTA 4 80 
GC CGCC CGCG 

• • • • 

4 81 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C A T C T CC CAGC GC TC 

521 ATGTTTC AGTGTTTGGAC AAAGGTGGGG ATTTG ATGCCGC 560 
C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 64 0 
A CCCCC TT CT 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 
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681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 
TACCGCG GCCAT 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

• * • • 

7 61 GATATCCAATTCGAACAGTTTCCC AATTAACAAGAGAAAT 800 
CCCTCT G CTC 

• • • • 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 84 0 
C T TCTGCCC CC 

841 CGAGGCTCGGCTC AGGGC ATAG AAAGAAGT ATTAGG AGTC 880 
TTTCATC G CTCC C C 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 92 0 
C C CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 
C CAAGG C TACG 

. 961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C ATA CAGC C G T 

• > • ■ 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 104 0 
CTC C C 

• • *~ • • 

1041 ACAACGTATTGTTGCTC AACT AGGTC AGGGCGTGTAT AG A 1080 
C T C C 

1081 ACATTATCGTCC ACTTTATATAGAAGACCTTTTAATATAG 1120 
CGT CGC CC C 

1121 GGATAAATAATC AACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

• • . •» 

1161 ATTTGCTTATGG AACCTCCTC AAATTTGCC ATCCGCTGTA 1200 
G C C T T C T 

• • • • 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 12 4 0 
G C T CT C C 

• • • • 

12 41 CGCCAC AGAATAACAACGTGCCACCTAGGC AAGGATTTAG 1280 

A C T C CTC 

1281 TCATCG ATTAAGCC ATGTTTCAATGTTTCGTTC AGGCTTT 1320 
CCAGG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 13 60 
C C TCC G C C C 

13 61 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 14 00 

C G C C C C C 
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• • • . 

1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

• • • • 

14 41 TTTCTTTTT AATGGTTC TGT AATTTC AGG ACC AGG ATTTA 14 80 

C C C C - C 

1481 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

1521 CATTCAG7VATAGAGGGTATATTGAAGTTCCAATTCACTTC 15 60 

• • . • 

1561 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 
C A GA 

• • • « 

1 601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 164 0 
G T 

• • . . 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C T C 

1681 TC ATTAG AT AATCTAC AATC AAGTG ATTTTGGTT ATTTTG 1720 
C G C C C C C 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 17 60 

C C C C 

• - • - 

1 7 6 1 AGGTGTTAGAAATTTTAGTGGGACTGC AGG AGTG ATAATA 1800 
G C T C 

• • . . 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 184 0 
C G C 

• • . a 

1841 CTGAATATAATCTGGA/^GAGCGC AGAAGGCGGTGAATGC 1880 
GCCTG C T C 

1881 GCTGTTTACGTCTAC AAACC AACT AGGGCTAAAAAC AAAT 1920 
CC CCCTGTCTG TC 

• ■ * • • 

1921 GTAACGGATTATCATATTGATC AAGTGTCC AATTTAGTTA 1960 
TTC C C CGC 

• • . . 

1961 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 
C CC TAGC G C C C C G T 

• • . • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 
CC T CC T CC 

2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 
GAGCCTG CCC C 

2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 
C G T T C C 
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2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 
C CCTGCGGC 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 
CCCATCC CTC 

• > ■ ■ 

2 201 ATTTGTATC AAAAAATCG ATGAATC AAAATTAAAAGCCTT 22 40 
C CGG GCCC 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 
GAG CT CC CC 

2 281 GACTTAGAAATCTATTTAATTCGCTAC AATGCAAAAC ATG 2320 
CT CCGCAG CGG 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 
GCG C T CC A 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGG AGAGCCGAAT 2400 
T TC C T G T C 

2 4 01 CGATGCGCGCCACACCTTGAATGG AATCCTGACTTAGATT 24 4 0 
AT G G C 

24 41 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

C C C C G C T 

• ■ • • 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTAC AGACTTA 2520 
C G C G T C G 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

C A C C C C 

• • • * 

25 61 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2 600 

C C A T C C T 

2 601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2 64 0 

G C T T C 

• • • • 

2 641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2 680 
G A G G G G C 

• • • * • 

2 681 TGG AATGGGAAAC AAATATCGTTTATAAAG AGGC AAAAGA 2720 
C T C C G C 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 
GCG GCG C G 

• • • • 

27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 
G CCCCC' CCC 

2801 ATAAACGTGTTC ATAGCATTCGAG AAGCTT ATCTGCCTGA 2 840 
C G C T G CT 
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• • * • 

2 841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 
T C CT GCTCCCG 

• ■ • * 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2 920 
CTGA CTC TGC 

2 92 1 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2 960 
C C CGC CCC 

• • • • 

2 961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

C CAG T T G C G G 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 304 0 
G TG C G GTG 

• * • • 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 
T C G A A A 

■ « • • 

3081 TCGTGGCTATATCCTTCGTGTCAC AGCGTAC AAGGAGGG A 3120 
AA CTC GCT 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 
C T G G C C 

3161 ATAC AG ACGAACTGAAGTTT AGCAACTGCGT AG AAG AGGA 3200 
C C G T CTC C G A 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 324 0 

CC CTT CCCC 

3241 GTAAATC AAGAAGAATACGGAGGTGCGTAC ACTTCTCGTA 3 280 
G G G C AGC 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 
CA T C T T C 

3321 TGCGTC AGTCTATGAAGAAAAATCGTATACAGATGG ACGA 33 60 
CCGCGG CC CA 

3361 AGAGAG AATCCTTGTG AATTTAAC AGAGGGT ATAGGG ATT 34 00 
CT C CGC TC C 

34 01 ACACGCC ACTACCAGTTGGTTATGTG AC AAAAG AATT AG A 34 4 0 
A T C TCGGCT 

3441 ATACTTCCCAGAAACCG ATAAGGTATGG ATTGAG ATTGG A 34 80 
G TTG CAG C CT 

3 4 81 GAAACGG AAGG AACATTT ATCGTGGAC AGCGTGG AATT AC 3520 

C G C C ^ GC T 
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TCCTTATGGAGGAA 3534 
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• • • • 

1 ATGACTGCAGATAATAATACGGAAGCACTAGATAGCTCTA 40 
CCCC CCCT 

• - • • 

4 1 CAACAAAAGATGTCATTCAAAAAGGCATTTCCGTAGTAGG 80 
CTG TCGGTC TG 

• • • • 

81 TGATCTCCTAGGCGTAGTAGGTTTCCCGTTTGGTGGAGCG 120 
ACTG GTAT CC C 

. • • • 

121 CTTGTTTCGTTTTATACAAACTTTTTAAATACTATTTGGC 160 
C GAGC C CCCC 

• • • • 

161 CAAGTGAAGACCCGTGGAAGGCTTTTATGGAACAAGTAGA 200 
CG T AAC G T 

• • • • 

201 AGCATTGATGGATCAG AAAATAGCTGATTATGC AAAAAAT 240 
TCT GTA CGC 

• • • ■ 

241 AAAGCTCTTGCAGAGTTACAGGGCCTTCAAAATAATGTCG 280 
GTG ACC GC G 

• • • . ' 

281 AAGATT ATGTGAGTGC ATTGAGTTC ATGGC AAAAAAATCC 320 
G C C TCCAGC G G C 

• ■ ♦ * • ■ , 
321 TGTGAGTTCACGAAATCC ACATAGCC AGGGGCGGATAAGA 3 60 

T C CA T C A TA C 

• • • • 

3 61 GAGCTGTTTTCTC AAGCAGAAAGTCATTTTCGTAATTCAA 400 

T C C TCC C CA A C 

. . • • 

4 01 TGCCTTCGTTTGCAATTTCTGGATACGAGGTTCTATTTCT 4 4 0 

AGC T C C T T C . 

• • • • 

4 41 AACAACATATGCACAAGCTGCCAACACACATTTATTTTTA 4 80 
CTG T CCGCC 

• • • * 

481 CTAAAAGACGCTCAAATTTATGGAGAAGAATGGGGATACG 520 
T G C G 

• . • • 

521 AAAAAG AAG AT ATTGCTG AATTTTATAAAAG AC AACT AAA 5 60 
G GC GCCGCT .T 

• • • • 

561 ACTTACGCAAGAATATACTGACCATTGTGTCAAATGGTAT 600 
G C C G C C G 

. . • • 

601 AATGTTGGATTAGATAAATTAAGAGGTTCATCTTATGAAT 64 0 
C TCC GCC CTCCG 

• « • • 

64 1 CTTGGGTAAACTTTAACCGTTATCGCAGAGAGATGACATT 680 
G C A A CA G C 
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• • • ' • 

681. AACAGTATTAGATTTAATTGC ACTATTTCC ATTGTATG AT 7 20 
GTGCCCTC C C C 

721 GTTCGGCTATACCCAAAAGAAGTTAAAACCGAATTAACAA 7 60 
GAAC G G TGCTC 

• • » • 

7 61 GAGACGTTTT AAC AGATC CAATTGTCGG AGTC AACAACCT 800 
GC C T C T 

801 TAGGGGCTATGGAACAACCTTCTCTAATATAGAAAATTAT 840 
T T AGC C C C 

• • • • 

841 ATTCG AAAACC AC ATCTATTTG ACT ATCTGC AT AG AATTC 880 
AG C C T C 

• • • • 

881 AATTTCACACGCGGTTCCAACCAGGATATTATGGAAATGA 920 
C AA T C T C 

• ■ « • 

921 CTCTTTCAATTATTGGTCCGGTAATTATGTTTCAACTAGA 960 
C C C C C 

■ • • • 

961 CCAAGCATAGGATCAAATGATAT AATC AC ATCTCC ATTCT 1000 
T T C C C 

1001 ATGGAAATAAATCCAGTGAACCTGTACAAAATTTAGAATT 1040 
TCG GGCCTG 

• - • • • 

1041. TAATGGAG AAAAAGTCTATAG AGCCGTAGC AAATAC AAAT 1080 
C C C G C C C 

• ■ • • 

1081 CTTGCGGTCTGGCCGTCC GCTGT AT ATTCAGGTGTT AC AA 1120 
CTG A ATC CC 

■ • • • 

1121 AAGTGGAATTTAGCCAAT ATAATG ATC7LAAC AGATGAAGC 1160 
G G TG C GC G 

• • • * 

1161 AAGTAC ACAAACGTACGACTC AAAAAGAAATGTTGGCGCG 1200 
CCCGT CCTC A 

• • • • 

1201 GTCAGCTGGGATTCTATCGATCAATTGCCTCCAGAAACAA 12 40 
TCT C C 

■ • • • 

12 41 C AGATG AACCTCTAGAAAAGGGAT AT AGCC ATC AACTC AA 1280 
C ATGG CC C T 

• • • • 

12 81 TTATGTAATGTGCTTTTT AATGC AGGGTAGT AGAGG AAC A 1320 

C G C G A TCC G C 

• • • • 

1321 ATCCCAGTGTTAACTTGGACACATAAAAGTGTAGACTTTT 13 60 
T G C C GTCC G C 

• « • • 

13 61 TTAACATGATTGATTCGAAAAAAATT AC AC AACTTCCGTT 1400 

C C AGC G G C T C 
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• • * . * 

14 01 AGTAAAGGCATATAAGTTACAATCTGGTGCTTCCGTTGTC 14 40 
G G A C C C G 

14 41 GCAGGTCCTAGGTTTACAGGAGGAGATATCATTC AATGCA 1480 

CACT TC CG 

• • • 

14 81 CAGAAAATGGAAGTGCGGCAACTATTTACGTTACACCGGA 1520 
GCCCAT C G T 

■ « • • 

1521 TGTGTCGTACTCTCAAAAATATCGAGCTAGAATTCATTAT 1560 
T G G CA G AC T C 

• • • ■ 

1561 GCTTCTACATCTCAGATAACATTTACACTCAGTTTAGACG 1600 
A CAGC C C C C G T 

• • • . • 

1601 GGGCACC ATTTAATCAAT ACT ATTTCG ATAAAACGATAAA 1640 
A CC C GTCTCGCC 

• • • • 

1641 TAAAGGAGACACATTAACGTATAATTCATTTAATTTAGCA 1680 
C T TC C A C AGC C C G 

• • • • 

1681 AGTTTCAGCACACCATTCGAATTATCAGGGAATAACTTAC 1720 

T C C C C TC T 

1721 AAATAGGCGTCAC AGGATTAAGTGCTGGAGAT AAAGTTTA 17 60 
GC CTCCCC C C 

17 61 TATAGACAAAATTGAATTTATTCCAGTGAAT 17 91 
C C G G C C C 
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1 ATG AATAATGTATTGAATAGTGGAAGAACAACTATTT 4 0 
GAG C C C CTC T C C 

• • • • 

4 1 GTGATGCGTATAATGTAGT AGCCC ATGATCC ATTTAGTTT 8 0 
CCAC CCGTC CC 

• • • • 

8 1 TGAAC ATAAATCATTAG ATACCATCCAAAAAG AATGGATG 120 
C C GAGCC C C T T G G G 

• • • • 

121 GAGTGGAAAAGAACAGATCATAGTTTATATGTAGCTCCTG 160 
A C T T C CTC C C C C A 

s • ■ ■ 

161 TAGTCGGAACTGTGTCTAGTTTTTTGCTAAAGAAAGTGGG 200 
GT A CCCCTC GC 

« • • ■ 

201 GAGTCTTATTGGAAAAAGGATATTGAGTGAATTATGGGGG 240 
CTC C C CTC TCC C C T 

2 41 ATAATATTTCCTAGTGGTAGTACAAATCTAATGCAAGATA 280 
C C ATC GTCC T C C 

2 81 TTTTAAGGGAGAC AGAACAATTCCTAAATCAAAGACTTAA 320 

CG C GTCCGCTC 

• ' > • • 

321 TACAGATACCCTTGCTCGTGTAAATGCAGAATTGATAGGG 360 
CT TG AACCTG CT 

• • • * 

3 61 CTCCAAGCGAATATAAGGGAGTTTAATCAACAAGTAGATA 400 

AC TCT CCG GC 

4 01 ATTTTTTAAACCCTACTCAAAACCCTGTTCCTTTATC AAT 4 4 0 

CCGTA GT G CTC 

4 41 AACTTCTTCGGTTAATACAATGGAGCAATTATTTCTAAAT 4 80 
C CGCT CCCCC 

• • • • 

4 81 AGATTACCCCAGTTCCAGATACAAGGATACCAGTTGTTAT 520 

G T T T C C CC 

• • • • 

521 TATTACCTTTATTTGCACAGGCAGCCAATATGCATCTTTC 5 60 
TC T AC C T T C CT G 

5 61 TTTTATTAGAGATGTTATTCTTAATGCAGATGAATGGGGT 600 

CCACTCGCCCTC A 

601 ATTTCAGC AGCAACATTACGTACGT ATCGAG ATTACCTG A 64 0 
C T C TC TA G A CA C T 

• • • • 

641 GAAATTATACAAGAGATTATTCTAATTATTGTATAAATAC 680 
GCCTCT CCC CCC 
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• • • • » 

681 GTATCAAACTGCGTTTAGAGGGTTAAACACCCGTTTACAC 
T G C C T AC C T TA GC T 

• • • • 
721 GATATGTTAGAATTTAGAAC ATATATGTTTTT AAATGTAT 

C CTGCGCC CCT CG 

• « • • 
7 61 TTGAATATGTATCC ATTTGGTC ATTGTTTAAATATCAGAG 

G C CAG AGTC C C G C 

801 TCTTATGGTATCTTCTGGCGCTAATTTATATGCTAGCGGT 
CTG GC ACCCC CTCT C 

• • • ■ 
841 AGTGGACCACAGC AGACACAATCATTTACAGCACAAAACT 

A T GAGC C T G 

• • • • 
881 GGCC ATTTTTATATTCTCTTTTCC AAGTTAATTCGAATTA 

C G AGCT G C C C C 

921 TATATTATCTGGTATTAGTGGTACTAGGCTTTCTATTACC 
C TC CAG CTC G C A C C A 

9 61 TTCCCTAATATTGGTGGTTTACCGGGTAGTACTACAACTC 
T C C AC T A CTCC C 

1001 ATTCATTGAATAGTGCCAGGGTTAATTATAGCGGAGGAGT 
AGCC T CTC A G C C T T 

1041 TTCATCTGGTCTCATAGGGGCGACTAATCTCAATCACAAC 
CAGC AT G T T A CT G C 

1081 TTTAATTGCAGCACGGTCCTCCCTCCTTTATCAACACCAT 
C TC C T G A C GAGC G 

1121 TTGTTAG AAGTTGGCTGGATTC AGGTAC AGATCG AG AGGG 
G GTCC T CAGC T C A 

1161 CGTTGCTACCTCT ACGAATTGGCAGACAG AATCCTTTC AA 
A AC A C G C 

• * ■ « 
1201 ACAACTTTAAGTTTAAGGTGTGGTGCTTTTTCAGCCCGTG 

CCTCCTC A CTA 

12 41 GAAATTCAAACTATTTCCCAGATTATTTTATCCGTAATAT 
G C T CCCTAGC 

12 81 TTCTGGGGTTCCTTTAGTT ATTAGAAACG AAG ATCTAAC A 

C T CCC CGT CCC 

1321 AGACCGTTACACTATAACCAAATAAGAAATATAGAAAGTC 
CTACTTC GTG CC GTC 

• • • • 

13 61 CTTCGGGAACACCTGGTGGAGCACGGGCCTATTTGGTATC 

ACTTAAT AATCCCG 
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14 01 TGTGCATAACAGAAAAAATAATATCTATGCCGCTAATGAA 14 40 
C GGCC CTCCG 

• • • • 

14 41 AATGGTACTATGATCCATTTGGCGCCAGAAGATTATACAG 1480 
CC TCCTA CT 

14 81 GATTTACTATATCGCCAATACATGCCACTCAAGTGAATAA 1520 

CCCT C TC C 

• • • • 

1521 TCAAACTCGAACATTTATTTCTGAAAAATTTGGiAAATCAA 15 60 
GA CCCCC GC 

• • • • 

15 61 GGTGATTCCTTAAGATTTGAACAAAGCAACACGACAGCTC 1600 

C GGCGTC TCA 

■ • • • 

1601 GTTATACGCTTAGAGGGAATGGAAATAGTTACAATCTTTA 1640 
GCTTG C CC C 

• ♦ • • 

1641 TTTAAGAGTATCTTCAATAGGAAATTCAACTATTCGAGTT 1680 
C G TAGC CTTCCCCT 

• • • • 

1681 ACTATAAACGGTAGAGTTTATACTGTTTCAAATGTTAATA 1720 
CC ACT CACT GC 

17 21 CCACTAC AAATAACGATGG AGTTAATGATAATGGAGCTCG 17 60 
TA GCT C CCC CA 

• • • • 

17 61 TTTTTC AGATATT AATATC GGTT^T ATAGT AGC AAGTG AT 1800 

A CAGC CCCTCCCG CTC C 

• • • « 

1801 AATACTAATGTAACGCTAGATATAAATGTGACATTAAACT 184 0 
C CTTTGCC CCCT 

• • • • 

18 41 CCGGTACTCCATTTGATCTCATGAATATTATGTTTGTGCC 18 80 

T A C C 

• • 

1881 AACTAATCTTCCACCACTTTAT 1902 
C C T T G C 
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1 ATGGAGGAAAATAATCAAAATCAATGCATACCTTACAATT 
G C C C T A C 

• . • • 
4 1 GTTTAAGTAATCCTGAAGAAGTACTTTTGG ATGGAGAACG 

CG CA GTGCT 

8 1 GATATCAACTGGTAATTC ATCAATTGATATTTCTCTGTCA 
CT C CT_CCCC CT C 

• • ■ • • 
121 CTTGTTC AGTTTCTGGTATCTAACTTTGT ACC AGGGGGAG 

T G C CAGC C G T T 

161 GATTTTTAGTTGGATTAATAGATTTTGTATGGGGAATAGT 
GCCTC C TCCC TC 

• • • • 
201 TGGCCCTTCTCAATGGGATGCATTTCTAGTAC AAATTGAA 

T A C G G G 

> • • ■ 

2 41 CAATTAATTAATGAAAGAATAGCTGAATTTGCTAGGAATG 

GGCCGGC GCC C 

• • • • 
281 CTGCTATTGCTAATTTAG AAGGATTAGG AAAC AATTTC AA 

CC CG GCTC 

321 TATATATGTGGAAGCATTTAAAGAATGGGAAGAAGATCCT 
CC GCC G GC 

3 61 AATAATCCAGAAACCAGGACCAGAGTAATTGATCGCTTTC 

C G CCTGGCCAACA 

• ■ • . " 
401 GTATACTTG ATGGGCTACTTG AAAGGGACATTCCTTCGTT 

ACTGC C CTGG AT C AC 

4 41 TCGAATTTCTGGATTTGAAGTACCCCTTTT ATCCGTTTAT 

CA C CC TTCG GC 

481 GCTCAAGCGGCCAATCTGCATCTAGCTATATTAAGAGATT 
AT T C C CC TC CA 

521 CTGTAATTTTTGGAGAAAGATGGGGATTGACAACGATAAA 
GC C G G C TC 

■ » ' ■ • 

5 61 TGTC AATGAAAACTATAATAG ACT AATTAGGC AT ATTGAT 

C GTCC TC C C 

60 1 GAATATGCTGATCACTGTGCAAATACGTATAATCGGGGAT 
GCCC TCCCCTC 

64 1 TAAATAATTTACCGAAATCTACGTATCAAGATTGGATAAC 
GCCCTG T -T. 

• • • • 
681 ATATAATCGATTACGGAGAGACTT AACATTGACTGTATTA 

C C CA G GA G CC C A T G 
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• • • . 

721 GATATCGCCGCTTTCTTTCCAAACTATGACAATAGG AGAT 7 60 
C T A C G C 

• • ■ • 

761 ATCCAATTCAGCCAGTTGGTCAACTAACAAGGGAAGTTTA 800 
CTCA G TCA C 

• - • . 

801 TACGGACCCATTAATTAATTTTAATCCACAGTTACAGTCT 840 
T CT CCCT G AAG 

• 

841 GTAGCTCAATTACCTACTTTTAACGTTATGGAGAGCAGCC 880 
CC CTCA C C TC 

881 GAATTAGAAATCCTCATTTATTTGATATATTGAATAATCT 920 
TCGCACG C C CC 

• • • . 

921 TACAATCTTTACGGATTGGTTTAGTGTTGGACGCAATTTT 960 
T CC CC GTCC 

• . • 

961 TATTGGGGAGGACATCGAGTAATATCTAGCCTTATAGGAG 1000 
T CA G C C CTCT T 

1001 GTGGTAACATAACATCTCCTATATATGGAAG AG AGGCG AA 1040 
G T C C C T A 

1041 CCAGGAGCCTCCAAGATCCTTTACTTTTAATGGACCGGTA 1080 

A C TAGT C C C C T A C 

1081 TTTAGGACTTTATCAAATCCTACTTTACGATTATTACAGC 1120 
CACGTC CGA GCC 

1121 AACCTTGGCCAGCGCCACCATTTAATTTACGTGGTGTTGA 1160 

T T C CC TA A 

• • - • 

1161 AGGAGTAGAATTTTCTACACCTACAAATAGCTTTACGTAT 1200 
G C T G C T C CTC C T C 

1201 CGAGGAAGAGGTACGGTTGATTCTTTAACTGAATTACCGC 1240 
A T AC CGCCCA 

• • . . 

12 41 CTGAGGATAATAGTGTGCCACCTCGCGAAGGATATAGTCA 1280 
A C C CA G C GTCC 

1281 TCGTTTATGTCATGC AACTTTTGTTCAAAG ATCTGGAACA 1320 
CAGGCC CCGGCTC T 

1321 CCTTTTTTAACAACTGGTGTAGT ATTTTCTTGG ACCGATC 1360 
ACCCTAATGCA T 

• • . . 

1361 GTAGTGCAACTCTTACAAATACAATTGATCCAGAGAGAAT 14 00 
T C T C C G 



FIG. 14B 



U.S. Patent Mar. 19, 1996 sheet 41 of 46 5,500,365 

; 1 4 0 1 TAATCAAATACCTTTAGTGAAAGG ATTTAG AGTTTGGGGG 144 0 
C CAGCGTCCTG A 

14 41 GGCACCTCTGTC ATTACAGGACC AGGATTTAC AGGAGGGG 1480 

AT C C C T 

1481 ATATCCTTCGAAGAAATACCTTTGGTGATTTTGTATCTCT 1520 
T A C T C GAGC 

1521 ACAAGTCAATATTAATTC ACCAATTACCCAAAGATACCGT 1560 
C TCCCT T T 

15 61 TTAAGATTTCGTTACGCTTCCAGTAGGGATGCACGAGTTA 1 600 

C C - G A TTCCC T C TA C 

1601 TAGTATTAACAGGAGCGGCATCCACAGGAGTGGGAGGCCA 1640 
CGCCCCATTCTCTA 

1641 AGTTAGTGTAAATATGCCTCTTCAGAAAACTATGGAAATA . 1 680 
CTCC G C AC G G C 

1681 GGGGAGAACTTAACATCTAGAACATTTAGATATACCGATT 1720 
C G CG CC C C 

1721 TTAGTAATCCTTTTTCATTTAGAGCTAATCCAGATATAAT 17 60 
CTC C CAGT CC T C C T C C 

17 61 TGGGATAAGTGAACAACCTCTATTTGGTGCAGGTTCTATT 1800 
CTC C AT AGC C 

1801 AGTAGCGGTGAACTTTATATAGATAAAATTGAAATTATTC 184 0 
TCATCT C TGCTC G GC 

1841 TAGCAGATGCAACATTTGAAGC AG AATCTG ATTTAG AAAG 1880 
TCCTCCCGTG ACA CC T G 

1881 AGCAC AAAAGGCGGTGAATGCCCTGTTTACTTCTTCCAAT 1920 
C G T C C C CA 

1921 CAAATCGGGTTAAAAACCGATGTGACGGATTATCATATTG 19 60 
GCTCG TACTTC C 

1961 ATCAAGTATCCAATTTAGTGGATTGTTTATCAGATGAATT 2000 
C G C G CACC ACC.TAGC G 

• * • > 

2001 TTGTCTGGATGAAAAGCGAGAATTGTCCGAGAAAGTCAAA 2040 
C CCCG TCC T 

2041 CATGCG AAGCGACTC AGTGATG AGCGG AATTTACTTCAAG 2080 
CC T CCA CCTG 

2081 ATCCAAACTTCAGAGGGATCAATAGACAACCAGACCGTGG 2120 
CT C A AC C G G A 
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2121 CTGGAGAGGAAGTACAGATATTACCATCCAAGGAGGAGAT 
TGT CCGGC CC 

• • • • 
2161 GACGTATTCAAAGAG AATTACGTC AC ACTACCGGGTACCG 

TG G C CCTCATT 

• • ■ • 
2201 TTGATG AGTGCt ATCCAACGTATTTATATC AGAAAATAGA 

CC CTCCGC GC 

• ■ • • 
2241 TGAGTCGAAATTAAAAGCTTATACCCGTTATGAATTAAGA 

C CC CTC AG CCT 

• ■ • • 
2281 GGGTAT ATCG AAGATAGTC AAG ACTTAGAAATCTATTTG A 

CC CC CT CC 

" • • • 

2321 TCCGTTACAATGCAAAAC ACG AAATAGTAAATGTGCCAGG 
AG CG GCCG C 

• • • • 

23 61 CACGGGTTCCTT ATGGCCGCTTTC AGCCCAAATGCC AATC 

T T C C A T . TCT C ^ T 

2401 GGAAAGTGTGGAGAACCGAATCGATGCGCGCCAC ACCTTG 
G G T CA T 

24 41 AATGGAATCCTG ATCTAG ATTGTTCCTGCAGAGACGGGG A 

G CT G C C G T C 

• • ■ • 
2481 AAAATGTGCACATC ATTCCC ATCATTTCACCTTGG ATATT 

GG CC T CT CC 

• • • . 
2521 GATGTTGGATGTACAGACTTAAATGAGGACTTAGGTGTAT 

G TCG CCAC 

• ♦ • • 

25 61 GGGTGATATTC AAG ATTAAGACGC AAG ATGGCC ATGC AAG 

C C C C C A C 

• • • • 
2 601 ACTAGGGAAXCTAGAGTTTCTCGAAGAGAAACCATTATTA 

T C C T GG C 

• " . . 
2641 GGGGAAGCACTAGCTCGTGTGAAAAGAGCGGAGAAGAAGT 

T T C G A 

2 681 GGAGAGACAAACGAGAGAAACTGCAGTTGGAAACAAATAT 
G T CG A G T C 

2721 TGTTTATAAAGAGGCAAAAGAATCTGTAGATGCTTTATTT 
C CG C GCG GC 

« • • • 

2 761 GTAAACTCTCAATATGATAGATTAC AAGTGGATACGAACA 
G C CAG G CC C C 

2 801 TCGCCATGATTCATGCGGCAGATAAACGCGTTCATAGAAT 

CCC C TGCC 
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• • * • 

2 841 CCGGGAAGCGTATCTGCC AG AGTTGTCTGTGATTCCAGGT 2880 
TTGTCT T C CT 

• • • ■ 

2881 GTCAATGCGGCC ATTTTCGAAGAATTAGAGGGACGTATTT 2920 
OCT C GCT C 

2 921 TTACAGCGTATTCCTTATATGATGCGAGAAATGTCATTAA 2 960 
CATC GC C C C 

2 961 AAATGGCGATTTCAATAATGGCTTATTATGCTGGAACGTG 3000 
G C T C C C CAGC T 

3001 AAAGGTCATGTAGATGTAGAAGAGCAAAACAACCACCGTT 304 0 

GCGGAG TG 

3041 CGGTCCTTGTTATCCCAGAATGGGAGGCAGAAGTGTCACA 3080 
C GGGTG AT C 

3081 AGAGGTTCGTGTCTGTCC AGGTCGTGGCTATATCCTTCGT 3120 
A A A A C T C 

3121 GTCACAGCATATAAAGAGGGATATGGAGAGGGCTGCGTAA 3160 
GCTCG CT T G 

3161 CGATCC ATGAGATCGAAG ACAATAC AGACG AACTG AAATT 3200 
C C GACC GTG 

• • • « 

3201 CAGCAACTGTGTAGAAGAGGAAGTATATCCAAACAACACA 324 0 
TC CCGAAC C C 

3241 GTAACGTGTAAT AATTATACTGGGACTC AAGAAGAATATG 3280 
TTCCGCC TA G GC 

32 81 AGGGTACGTAC ACTTCTCGTAATC AA-GG AT ATG ACG AAGC 3320 
GA G C AGC CAG T CA 

3321 CTATGGTAATAACCCTTCCGTACCAGCTGATTACGCTTCA 3360 
TCC TCXXXXXXXXXXXX T T C T C C 

3361 GTCTATGAAGAAAAATCGTATACAG ATGGACGAAG AGAGA 34 00 
GCGG CC CA CT 

• • • • 

34 01 ATCCTTGTGAATCTAACAGAGGCT ATGGGG ATTACACACC 34 4 0 
C C G TC T CA C 

• • • • 

34 41 ACTACCGGCTGGTTATGTAACAAAGG ATTT AG AGTACTTC 3 4 80 
TATC T C GCT T 

34 81 CCAGAGACCGAT AAGGTATGG ATTG AGATCGGAGAAAC AG 3520 
T CAG C . T C 

• • • • 

3521 AAGGAACATTCATCGTGGATAGCGTGGAATTACTCCTTAT 3560 
G C C GC T T G 
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1 AGATCTAGAGGTAATTGTTATGAGTACTGTCGTGGTTAAG 

GATC 

• » • . 

4 1 GGAAACGTCAACGGTGGTGTACAAC AACCTAGAAGGAGGA 
G T A 

• • • • 

8 1 GAAGGCAATCCCTTCGCAGGAGGGCTAAC AGAGTACAGCC 

T A T 

• • • ■ 
121 AGTGGTTATGGTC ACTGCTCCTGGCG AACCCAGGAGGAGG 

GC A A A 

• ■ • • 

161 AGACGCAGAAG AGGAGGCAATCGCAGGTC AAGAAGAACTG 
AG T A 

201 GAGTTCCCAGGGGAAGGGGCTCAAGCGAGACATTCGTGTT 

A AT 

• • • • 

2 41 TACAAAGGACAACCTCGTGGGCAACTCCCAAGGAAGTTTC 



2 81 ACCTTCGGACCAAGTGTATCAGACTGTCCAGCATTC AAGG 

T 

• • • ' • 

321 ATGGAATACTCAAGGCCTACCATGAGTACAAGATC AC AAG 

T 

. • • • « 

3 61 TATCCTTCTTCAGTTCGTCAGCGAGGCCTCTTCCACCTCA 

T G T 

• • • • 

4 01 CCAGGATCCATCGCTTATGAGTTGGACCCACATTGCAAAG 

C AT 

4 41 TATCATCCCTCCAGTCCTACGTCAACAAGTTCCAAATCAC 
T 

• • ' ■ • 

4 81 AAAGGGAGGAGCT AAGACCTATC AAGCTAGG ATGATC AAC 
T T C T 

• • • • 
521 GGAGTAG AATGGC ACGATTCATCTG AGGATC AGTGC AGGA 

T T A 

561 TACTTTGGAAAGGAAGTGGAAAATCTTCAGACCCAGCAGG 
C A G T T 

• • • • 

601 ATCTTTCAGAGTCACCATCAGAGTGGCTCTTCAAAACCCC 

T T A 

641 AAGTAATAGACTCCGGATCAGAGCCTGGTCCAAGCCCACA 

AT 
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« • • • 

681 ACCAACACCCACTCCAACTCCCCAAAAGCATGAGCGATTT 720 

• • • » 

7 21 ATTGCTTACGTCGGC ATACCT ATGCTGACC ATTCAAG AAT 7 60 
7 61 TC 7 62 
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SYNTHETIC PLANT GENES 

This is a File Wrapper Continuation of application Ser. 
No. 07/476,661, filed Feb. 12, 1990. now abandoned which 
is a continuation-in-part of U.S. Ser. No. 07/315,355, filed 5 
Feb. 24, 1989. now abandoned. 

BACKGROUND OF THE INVENTION 

The present invention relates to genetic engineering and 
more particularly to plant transformation in which a plant is 
transformed to express a heterologous gene. 

Although great progress has been made in recent years 
with respect to transgenic plants which express foreign 
proteins such as herbicide resistant enzymes and viral coat 
proteins, very little is known about the major factors affect- 
ing expression of foreign genes in plants. Several potential 
factors could be responsible in varying degrees for the level 
of protein expression firom a particular coding sequence. The 
level of a particular mRNA in the cell is certainly a critical 
factor. 

The potential causes of low steady state levels of mRNA 
due to the nature of the coding sequence are many. First, full 
length RNA synthesis might not occur at a high frequency. 25 
This could, for example, be caused by the premature termi- 
nation of RNA during transcription or due to imexpected 
mRNA processing during transcription. Second, full length 
RNA could be produced but then processed (splicing, poly A 
addition) in the nucleus in a fashion that creates a nonfunc- 30 
tional mRNA. If the RNA is properly synthesized, termi- 
nated and poiyadenylated, it then can move to the cytoplasm 
for translation. In the cytoplasm, mRNAs have distinct half 
lives that are determined by their sequences and by the cell 
type in which they are expressed. Some RNAs are very 35 
short-lived and some are much more long-lived. In addtion, 
there is an effect, whose magnitude is uncertain, of transla- 
tional efficiency on mRNA half-life. In addition, every RNA 
molecule folds into a particular structure, or perhaps family 
of sturctures, which is determined by its sequence. Hie 40 
particular structure of any RNA might lead to greater or 
lesser stability in the cytoplasm. Structure per se is probably 
also a determinant of mRNA processing in the nucleus. 
Unfortunately, it is impossible to predict, and nearly impos- 
sible to determine, the structure of any RNA (except for 45 
tRNA) in vitro or in vivo. However, it is likely that dra- 
matically changing the sequence of an RNA will have a large 
effect on its folded structure. It is likely that structure per se 
or particular stmctural features also have a role in determin- 
ing RNA stability. 50 

Some particular sequences and signals have been identi- 
fied in RNAs that have the potential for having a specific 
effect on RNA stability. This section summarizes what is 
known about these sequences and signals. Hiese identified 
sequences often are A-hT rich, and thus are more likely to 55 
occur in an A+T rich coding sequence such as a B.t gene. 
The sequence motif ATTTA (or AUUUA as it spears in 
RNA) has been implicated as a destabilizing sequence in 
mammalian cell noRNA (Shaw and Kamen, 1986). No 
analysis of the function of this sequence in plants has been 60 
done. Many short lived mRNAs have A-i-T rich 3' untrans- 
lated regions, and these regions often have the ATTTA 
sequence, sometimes present in mutiple copies or as multi- 
mers (e.g., ATTTATTTA , . . ). Shaw and Kamen showed that 
the transfer of the 3* end of an unstable mRNA to a stable 65 
RNA (globin or VAl) decreased tiie stable RNA's half life 
dramatically. They further showed that a pentamer. of 
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ATTTA had a profound destabilizing effect on a stable 
message, and that this signal could exert its effect whether 
it was located at the 3' end or within the coding sequence. 
However, the number of ATTTA sequences and/or the 
sequence context in which they occur also appear to be 
important in determining whether they function as destabi- 
lizing sequences. Shaw and Kamen showed that a trimer of 
ATTTA had much less effect than a pentamer on mRNA 
stability and a dimer or a monomer had no effect on stability 
(Shaw and Kamen, 1987). Note that multimers of ATTTA 
such as a pentamer automatically create an A+T rich region. 
This was shown to be a cytoplasmic effect, not nuclear. In 
other unstable mRNAs, the ATTTA sequence may be present 
in only a single copy, but it is often contained in an A+T rich 
region. From the animal cell data collected to date, it appears 
that ATTTA at least in some contexts is important in stability, 
but it is not yet possible to predict which occurences of 
ATTTA are destabiling elements or whether any of these 
effects are likely to be seen in plants. 

Some studies on mRNA degradation in animal cells also 
indicate that RNA degradation may begin in some cases with 
nucleolytic attack in A+T rich regions. It is not clear if these 
cleavages occur at ATTTA sequences. There are also 
examples of mRNAs that have differential stability depend- 
ing on the cell type in which they are expressed or on the 
stage within the cell cycle at which they are expressed. For 
example, histone mRNAs are stable during DNA synthesis 
but unstable if DNA synthesis is disrupted. The 3' end of 
some histone mRNAs seems to be responsible for this effect 
(Pandey and Marzluff, 1987). It does not appear to be 
mediated by Al'l'lA, nor is it clear what controls the 
differential stability of this mRNA. Another example is the 
differential stability of IgG mRNA in B lymphocytes during 
B cell maturation (Genovese and Milcarek, 1988). A final 
example is the instability of a mutant beta-thallesemic 
globin mRNA. In bone marrow cells, where this gene is 
normally expressed, the niutant mRNA is unstable, while the 
wild-type mRNA is stable. When the mutant gene is 
expressed in HeLa or L cells in vitro, the mutant mRNA 
shows no instability (Lim et al., 1988). These examples all 
provide evidence that mRNA stability can be mediated by 
cell type or cell cycle specific factors. Furthermore this type 
of instability is not yet associated with specific sequences. 
Given these uncertainties, it is not possible to predict which 
RNAs are likely to be unstable in a given cell. In addition, 
even the ATTTA motif may act differentially depending on 
the nature of the cell in which the RNA is present Shaw and 
Kamen (1987) have reported that activation of protein 
kinase C can block degradation mediated by ATTTA. 

The addition of a polyadenylate string to the 3* end is 
common to most eucaryotic mRNAs, both plant and animal. 
The currcntiy accepted view of polyA addition is that the 
nascent transcript extends beyond the mature 3' terminus. 
Contained within this transcript are signals for polyadeny- 
lation and proper 3' end formation. This processing at the 3' 
end involves cleavage of the mRNA and addition of polyA 
to the mature 3' end. By searching for consensus sequences 
near the polyA tract in both plant and animal mRNAs, it has 
been possible to identify consensus sequences that appar- 
entiy are involved in polyA addition and 3' end cleavage. 
The same consensus sequences seem to be important to both 
of these processes. These signals are typically a variation on 
the sequence AATAAA. In animal ceUs, some variants of 
this sequence that are functional have been identified; in 
plant cells there seems to be an extended range of functional 
sequences (Wickens and Stephenson, 1984; Dean et al., 
1986). Because all of these consensus sequences are varia- 
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tions on AATAAA, they all are A+T rich sequences. This 
sequence is typically found 15 to 20 bp before the poly A 
tract in a mature mRNA. Experiments in animal cells 
indicate that this sequence is involved in both polyA addi- 
tion and 3' maturation. Site directed mutations in this 
sequence can disrupt these functions (Conway and ^^^ckens, 
1988; Wickens et al., 1987). However, it has also been 
observed that sequences up to 50 to 100 bp 3* to the putative 
poly A signal are also required; i.e., a gene that has a normal 
AATAAA but has been replaced or disrupted downstream 
does not get properly polyadenylated (Gil and Proudfoot, 
1984; Sadofsky and Alwine, 1984; McDevitt et al.. 1984). 
That is, the polyA signal itself is not sufScient for complete 
and proper processing. It is not yet known what specific 
downstream sequences are required in addition to the polyA 
signal, or if there is a specific sequence that has this function. 
Therefore, sequence analysis can only identify potential 
polyA signals. 

In naturally occuring mRNAs that are normally polyade- 
nylated, it has been observed that disruption of this process, 
either by altering the polyA signal or other sequences in the 20 
mRNA, profound effects can be obtained in the level of 
functional mRNA. This has been observed in several natu- 
rally occuring mRNAs, with results that are gene specific so 
far. There are no general rules that can be derived yet fix>m 
the study of mutants of these natural genes, and no rules that 25 
can be applied to heterologous genes. Below are four 
examples: 

1. In a globin gene, absence of a proper polyA site leads 
to improper termination of transcription. It is likely, but not 
proven, that the improperly terminated RNA is nonfunc- 
tional and unstable (Proudfoot et al., 1987). 

2. In a globin gene, absence of a functional polyA signal 
can lead to a 100-fold decrease in the level of mRNA 
accumulation (Proudfoot et al., 1987). 

3. A globin gene polyA site was placed into the 3' ends of 
two different histone genes. The histone genes contain a 
secondary stmcture (stem-loop) near their 3* ends. The 
amount of properly polyadenylated histone mRNA produced 
from these chimeras decreased as the distance between the 
stem-loop and the polyA site increased. Also, the two 
histone genes produced greatly different levels of property 
polyadenylated mRNA. This suggests an interaction 
between the polyA site and other sequences on the mRNA 
that can modulate mRNA accumulation (Pandy and Mar- 
zluff. 1987). 

4. The soybean leghemoglobin gene has been cloned into 
HeLa cells, and it has been determined that this plant gene 
contains a "cryptic" polyadenylation signal that is active in 
animal cells, but is not utOized in plant cells. This leads to 
the production of a new polyadenylated mRNA that is 
nonfunctional. This again shows that analysis of a gene in 
one cell type cannot predict its behavior in alternative cell 
types (Waebauer et al., 1988). 

From these examples, it is clear that in natural mRNAs 
proper polyadenylation is important in mRNA accumula- 
tion, and that disruption of this process can effect mRNA 
levels significantly. However, insufficient knowledge exists 
to predict the effect of changes in a normal gene. In a 
heterologous gene, where we do not know if the putative go 
polyA sites (consensus sequences) are functional, it is even 
harder to predict the consequences. However, it is possible 
that the putative sites identified are disfunctional. That is, 
these sites may not act as proper polyA sites, but instead 
function as aberrant sites that give rise to unstable mRNAs. 

In animal cell systems, AATAAA is by far the most 
common signal identified in mRNAs upstream of the polyA, 
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but at least four variants have also been found (Wickens and 
Stephenson, 1984). In plants, not nearly so much analysis 
has been done, but it is clear that multiple sequences similar 
to AATAAA can be used. The plant sites below called major 
or minor refer only to the study of Dean et al. (1986) which 
analyzed only three types of plant gene. The designation of 
polyadenylation sites as major or minor refers only to the 
firequency of their occurrence as functional sites in naturally 
occurring genes that have been analyzed. In the case of 
plants this is a very limited database. It is hard to predict 
with any certainty that a site designated major or minor is 
more or less likely to function partially or completely when 
foimd in a heterologous gene such as B.L 
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Another type of RNA processing that occurs in the 
nucleus is intron splicing. Nearly all of the work on intron 
processing has been done in animal cells, but some data is 
emerging from plants. Intron processing depends on proper 
S' and 3' splice junction sequences. Consensus sequences for 
these jimctions have been derived for both animal and plant 
mRNAs, but only a few nucleotides are known to be 
invarianL Therefore, it is hard to predict with any certainty 
whether a putative splice junction is functional or partially 
functional based solely on sequence analysis. In particular, 
the only invariant nucleotides are GT at the 5' end of the 
intron and AG at the 3' end of the intron. In plants, at every 
nearby position, either within the intron or in the exon 
flanking the intron, all four nucleotides can be found, 
although some positions show some nucleotide preference 
(Brown, 1986; Hanley and Schuler, 1988). 

A plant intron has been moved from a patatin gene into a 
GUS gene. To do this, site directed mutagenesis was per- 
formed to introduce new restriction sites, and this mutagen- 
esis changed several nucleotides in the intron and exon 
sequences flanking the GT and AG. This intron still func- 
tioned properly, indicating the importance of the GT and AG 
and the flexibility at other nucleotide positons. There are of 
course many occurences of GT and AG in all genes that do 
not function as intron splice junctions, so there must be some 
other sequence or structrual features that identify splice 
junctions. In plants, one such feature appears to be base 
composition per se. Wiebauer et al. (1988) and Goodall et al. 
(1988) have analyzed plant introns and exons and found that 
exons have -50% A+T while introns have -70% A+T 
Goodall et al. (1988) also created an artificial plant intron 
that has consensus 5' and 3* splice junctions and a random 
A+T rich intemal sequence. This intron was spliced cor- 
rectly in plants. When the intemal segment was replaced by 
a G+C rich sequence, splicing ef&ciency was drastically 
reduced. These two examples demonsatrate that intron rec- 
ognition in plants may depend on very general features — 
splice jimctions that have a great deal of sequence diversity 
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and A+T richness of the intron itself. This, of course, makes 
it difBcult to predict from sequence alone whether any 
particular sequence is likely to function as an active or 
partially active intron for RNA processing. 

B.L genes being A+T rich contain numerous stretches of 5 
various lengths that have 70% or greater A+T. The number 
of such stretches identified by sequence analysis depends on 
the length of sequence scanned. 

As for polyadenylatioh described above, there are com- 
plications in predicting what sequences might be utilized as lo 
splice sites in any given gene. First, many naturally occuring 
genes have alternative splicing pathways that create alter- 
native combinations of exons in the final mRNA (Gallega 
and Nadal-Ginard, 1988; Helfinan and Ricci, 1988; T^u- 
nishita and Kom, 1989). That is, some splice junctions are 15 
apparently recognized imder some circumstances or in cer- 
tain cell types, but not in others. The rules governing this are 
not understood. In addition, there can be an interaction 
between processing paths such that utilization of a particular 
polyadenylation site can interfere with splicing at a nearby 20 
splice site and vice versa (Adami and Nevins, 1988; Brady 
and Wold, 1988; Marzlufif and Pandey, 1988). Again no 
predictive rules are available. Also, sequence changes in a 
gene can drastically alter the utilization of particular splice 
junctions. For example, in a bovine growth hormone gene, 25 
small deletions in an exon a few hundred bases downstream 
of an intron cause the splicing efficiency of the intron to drop 
&om greater than 95% to less than 2% (essentially nonfunc- 
tional). Other deletions however have essentially no effect 
(Hampson and Rottman, 1988). Finally, a variety of in vitro 30 
and in vivo experiments indicate that mutations that disrupt 
normal splicing lead to rapid degradation of the RNA in the 
nucleus. Splicing is a mxiltistep process in the nucleus and 
mutations in normal splicing can lead to blockades in the 
process at a variety of steps. Any of these blockades can then 35 
lead to an abnormal and unstable RNA. Studies of mutants 
of normally processed (polyadenylation and splicing) genes 
are relevant to the study of heterologous genes such as B.t. 
B.t genes might contain functional signals that lead to the 
production of aberrant nonfunctional mRNAs, and these 40 
mRNAs are likely to be unstable. But the B.t genes are 
perhaps even more likely to contain signals that are analo- 
gous to mutant signals in a natural gene. As shown above 
these mutant signals are very likely to cause defects in the 
processing pathways whose consequence is to produce 45 
unstable mRNAs. 

It is not known with any certainty what signals RNA 
transcription termination in plant or animal cells. Some 
studies on animal' genes that indicate that stretches of 
sequence rich in T cause termination by calf thymus RNA SO 
polymerase 11 in vitro. These studies have shown that the 3* 
ends of in vitro terminated transcripts often lie within runs 
of T such as T5. T6 or T7. Other identified sites have not 
been composed solely of T, but have had one or more other 
nucleotides as well. Ibrmination has been found to occur 55 
within the sequences TATTTTTT, ATTCTC, TTCTT 
(Dedrick et al.. 1987; Reines et al.. 1987). In the case of 
these latter two, the context in which the sequence is found 
has been C+T rich as well. It is not known if this is essential. 
Other studies have implicated stretches of A as potential 60 
transcriptional terminators. An interesting example from 
SV40 illustrates the uncertainty in defining terminators 
based on sequence alone. One potential terminator in SV40 
was identified as being A rich and having a region of dyad 
synmietry (potential stem-loop) 5* to the A rich stretch. 65 
However, a second terminator identified experimentally 
downstream in the same gene was not A rich and included 
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no potential secondary structure (Kessler et al., 1988). Of 
course, due to the A+T content of B.t. genes, they are rich 
in runs of A or T that could act as terminators. The impor- 
tance of termination to stability of the mRNA is shown by 
the globin gene example described above. Absence of a 
normal polyA site leads to a failure in proper termination 
with a consequent decrease in mRNA. 

There is also an effect on mRNA stability due the trans- 
lation of the mRNA. Premature translational termination in 
human triose phosphate isomerase leads to instability of the 
mRNA (Daar et al.. 1988). Another example is the beta- 
thallesemic globin mRNA described above that is specifi- 
cally imstable in bone marrow cells (Lim et al.. 1988). The 
defect in this mutant gene is a single base pair deletion at 
codon 44 that leads to translational termination (a nonsense 
codon) at codon 60. Compared to properly translated normal 
globin mRNA, this mutant RNA is very unstable. These 
results indicate that an improperly translated mRNA is 
unstable. Other work in yeast indicates that proper but poor 
translation can have an effect on mRNA levels. A heterolo- 
gous gene was modified to convert certain codons to more 
yeast preferred codons. An overall 1 0-fold increase in pro- 
tein production was achieved, but there was also about a 
3-fold increase in mRNA Hoekema et al., 1987). This 
indicates that more efficient translation can lead to greater 
mRNA stability, and that the effect of codon usage can be at 
the RNA level as weU as the translational level. It is not clear 
from codon usage studies which codons lead to poor trans- 
lation, or how this is coupled to mRNA stability. 

Therefore, it is an object of the present invention to 
provide a method for preparing synthetic plant genes which 
express their respective proteins at relatively high levels 
when compared to wild-type genes. It is yet another object 
of the present invention to provide synthetic plant genes 
which express the crystal protein toxin of Bacillus thuring- 
iensis at relatively high levels. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIGS, \a-\b illustrate the steps employed in modifying a 
wild-type gene to increase expression efficiency in plants. 

FIGS. 2ct~2c illustrate a comparison of the changes in the 
modified B.tk. HD-1 sequence of Example 1 Oower line) 
versus the wild-type sequence of B. t k. HD-1 which 
encodes the crystal protein toxin (upper line). 

FIGS. 3a-3c illustrate a comparison of the changes in the 
synthetic B.tk. HD-1 sequence of Example 2 Oower line) 
versus the wOd-type sequence of B. t, k. HD-1 which 
encodes the crystal protein toxin (upper line). 

FIGS. 4a-~4c illustrate a comparison of the changes in the 
synthetic B. L k. HD-73 sequence of Example 3 (lower line) 
versus the wild-type sequence of B.tk. HD-73 (upper line). 

FIG. 5 represents a plasmid map of intermediate plant 
transformation vector cassette pMON893. 

FIG. 6 represents a plasmid map of intermediate plant 
transformation vector cassette pMON900. 

FIG. 7 represents a map for the disarmed T-DNA of A. 
tumefaciens ACO. 

FIGS. 8a-8c illustrate a comparison of the changes in the 
synthetic truncated B.t.k. HD-73 gene (Amino acids 29-615 
with an N-terminal Met-Ala) of Example 3 Qower line) 
versus the wild-type sequence of B.tk. HD-73 (upper line). 

FIGS. 9a~9e illustrate a comparison of the changes in the 
synthetic/ wild- type full length B.tJc. HD-73 sequence of 
Example 3 Qower line) versus the wild-type f^ill-length 
sequence of B.tk. HD-73 (upper line). 
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FIGS. lOfl-lOe illustrate a comparison of the changes in 
the synthetic/modified full length B.t.k. HD-73 sequence of 
Example 3 (lower line) versus the wild-type fiill-length 
sequence of B.tk. HD-73 (upper line). 

FIGS, lla-lle illustrate a comparison of the changes in ^ 
the fully synthetic full-length B.t.k, HD-73 sequence of 
Example 3 Oower line) versus the wild-type fiill-length 
sequence of B.Lk. HD-73 (upper line). 

FIGS. ITa-llc illustrate a comparison of the changes in 
the synthetic B.t.t. sequence of Example 5 (lower line) 
versus the wild- type sequence of Bxt. which encodes the 
crystal protein toxin (upper line). 

FIGS. 13a— 13c illustrate a comparison of the changes in 
the synthetic B.t P2 sequence of Example 6 Qower line) 
versus the wild-type sequence of B.Lk. HD-1 which encodes 
the P2 protein toxin (upper line). 

FIGS. 14fl-14e illustrate a comparison of the changes in 
the synthetic Bx entomocidus sequence of Example 7 
(lower line) versus the wOd-type sequence of B.L entomoci- 20 
dus which encodes the Btent protein toxin (upper line). 

FIG. 15 illustrates a plasmid map for plant expression 
cassette vector pMON744. 

FIGS. 16a-16b illustrate a comparison of the changes in 
the synthetic potato leaf roll virus (PLRV) coat protein 25 
sequence of Example 9 (lower line) versus the wild-type 
coat protein sequence of PLRV (upper line). 
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The present invention provides a method for preparing 
synthetic plant genes which genes express their protein 
product at levels significandy higher than the wild-type 
genes which were commonly employed in plant transfor- 
mation heretofore. In another aspect, the present invention 
also provides novel synthetic plant genes which encode 
non-plant proteins. For brevity and clarity of description, the 
present invention will be primarily described with respect to 
the preparation of synthetic plant genes which encode the ^ 
crystal protein toxin of Bacillus thuringiensis (B.t). Suitable 
B.t. subspecies include, but are not limited to, B,t. kurstaH 
HD-1, B.t. kurstaki HD-73, B,t sotto, B.L berliner, B.t. 
ihuringiensis, B.t. tolworthi, B.t. dendrolimus, B.t. alesti, B,t. 
galleriae, B.t. aizawai, B.t. subtoxicus, B.t. entomocidus, Bj. 
tenebrionis and B.t. son diego. However, those skilled in the 
art will recognize and it should be understood that the 
present method may be used to prepare synthetic plant genes 
which encode nonplant proteins other than the crystal pro- 
tein toxin of B.t as well as plant proteins (see for instance, 
Example 9). 

The expression of B.t genes in plants is problematic. 
Although the expression of B.t. genes in plants at insecti- 
cidal levels has been reported, this accomplishment has not 
been straightforward. In particular, the expression of a 55 
full-length lepidopteran specific B.t gene (comprising DNA 
from a B.tk. isolate) has been reported to be unsuccessful in 
yielding insecticidal levels of expression in some plant 
species (Vaeck et al., 1987 and Barton et al., 1987). 

It has been reported that expression of the full-length gene 60 
from B.tk. HD-1 was detectable in tomato plants but that 
tmncated genes led to a higher frequency of insecticidal 
plants with an overall higher level of expression. Truncated 
genes of B.t. berliner also led to a higher frequency of 
insecticidal plants in tobacco (Vaeck et al., 1987). On the 65 
other hand, insecticidal plants were provided from lettuce 
transformants using a full-length gene. 



It has also been reported that the full length gene from 
B.tk. HD-73 gave some insecticidal effect in tobacco 
(Adang et al., 1987). However, the B.t mRNA detected in 
these plants was only 1 .7 kb compared to the expected 3.7 
kb indicating improper expression of the gene. It was 
suggested that this truncated mRNA was too short to encode 
a functional truncated toxin, but there must have been a low 
level of longer mRNA in some plants or no insecticidal 
activity would have been observed. Others have reported in 
a publication that they observed a large amount of shorter 
than expected mRNA from a truncated B.tk. gene, but some 
mRNA of the expected size was also observed. In fact, it was 
suggested that expression of the fiill length gene is toxic to 
tobacco callus (Barton et al., 1987). The above illustrates 
that lepidopteran type B.t genes are poorly expressed in 
plants compared to other chimeric genes previously 
expressed from the same promoter cassettes. 

The expression of B.tt in tomato and potato is at levels 
similar to that of B.tk. (i.e., poor). B.t.t and B.t.k. genes 
share only limited sequence homology, but they share many 
conmion features in terms of base composition and the 
presence of particular A+T rich elements. 

All reports in the field have noted the lower than expected 
expression of B.t. genes in plants. In general, insecticidal 
eflBcacy has been measured using insects very sensitive to 
B.t toxin such as tobacco hpmworm. Although it has been 
possible to obtain plants totally protected against tobacco 
homworm, it is important to note that homworm is up to 500 
fold more sensitive to B.t. toxin than some agronomically 
important insect pests such as beet army worm. It is therefore 
of interest to obtain transgenic plants that are protected 
against all important lepidopteran pests (or against Colorado 
potato beetle in the case of B.t. tenebrionis\ and in addition 
to have a level of expression that provides an additional 
safety mai;gin B.t over and above the efficacious protection 
level. It is also important to devise plant genes which 
function reproducibly from species to species, so that insect 
resistant plants can be obtained in a predictable fashion. 

In order to achieve these goals, it is important to under- 
stand the nature of the poorer than expected expression of 
B.t genes in plants. The level of stable B.t mRNA in plants 
is much lower than expected. That is, compared to other 
coding sequences driven by the same promoter, the level of 
B.t mRNA measured by Northern analysis or nuclease 
protection experiments is much lower. For example, tomato 
plant 337 (FischhoflF et al., 1987) was selected as the best 
expressing plant with pMON9711 which contains the B.tk. 
HD-1 Kpnl fragment driven by the CaMV 35S promoter and 
contains the NOS-NPTH-NOS selectable marker gene. In 
this plant the level of B.t mRNA is between 100 to 1(X)0 fold 
lower than the level of NPTII mRNA, even though the 35S 
promoter is approximately 50-fold stronger than the NOS 
promoter (Sanders et al., 1987). 

The level of B.t toxin protein detected in plants is 
consistent with the low level of B.t mRNA. Moreover, the 
insecticidal efficacy of the transgenic plants correlates with 
the B.t protein level indicating that the toxin protein pro- 
duced in plants is biologically active. Therefore, the low 
level of B.t. toxin expression may be the result of the low 
levels of B.t mRNA. 

Messenger RNA levels are determined by the rate of 
synthesis and rate of degradation. It is the balance between 
these two that determines the steady state level of mRNA. 
The rate of synthesis has been maximized by the use of the 
CaMV 358 promoter, a strong constitutive plant expressible 
promoter. The use of other plant promoters such as nopaline 
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synthase (NOS), mannopine synthase (MAS) and libulose 
bisphosphatecarboxylase small subunit (RUBISCO) have 
not led to dramatic changes in the levels of B.t toxin protein 
expression indicating that the effects determining B.t toxin 
protein levels are promoter independent. These data imply 5 
that the coding sequences of DNA genes encoding B.t. toxin 
proteins are somehow responsible for the poor expression 
level, and that this effect is manifested by a low level of 
accumulated stable mRNA. 

Lower than expected levels of mRNA have been observed jq 
with four different lepidopteran specific genes (two from 
B.t.k. HD-1; B.t. berliner and B.t.k. HD-73) as well as the 
gene from the coleopteran specific B,t. tenebrionis. It 
appears that for lepidopteran type B.t. genes these effects are 
manifest more strongly in the full length coding sequences 
than in the truncated coding sequences. These effects are 
seen across plant species although their magnitude seems 
greater in some plant species such as tobacco. 

The nature of the coding sequences of B.t genes distin- 
guishes them firom plant genes as well as many other 
heterologous, genes expressed in plants. In particular, B.t 
genes are very rich (—62%) in adenine (A) and thymine fT) 
while plant genes and most bacterial genes which have been 
expressed in plants are on the order of 45-55% A+T. The 
A+T content of the genomes (and thus the genes) of any ^ 
organism are features of that organism and reflect its evo- 
lutionary history. While within any one organism genes have 
similar A+T content, the A+T content can vary tremendously 
from organism to organism. For example* some Bacillus 
species have among the most A+T rich genomes while some 
Steptomyces species are among the least A+T rich genomes 
(-30 to 35% A+T). 

Due to the degeneracy of the genetic code and the limited 
number of codon choices for any amino acid, most of the 
"excess" A+T of the structural coding sequences of some 35 
Bacillus species are found in the third position of the codons. 
lliat is, genes of some Bacillus species have A or T as the 
third nucleotide in many codons. Thus A+T content in part 
can determine codon usage bias. In addition, it is clear that 
genes evolve for maximum function in the organism in 40 
which they evolve. This means that particular nucleotide 
sequences found in a gene from one organism, where they 
may play no role except to code for a particular stretch of 
amino acids, have the potential to be recognized as gene 
control elements in another organism (such as transcrip- 45 
tional promoters or terminators, poly A addition sites, intron 
splice sites, or specific mRNA degradation signals). It is 
perhaps surprising that such misread signals are not a more 
common feature of heterologous gene expression, but this 
can be explained in part by the relatively homogeneous A+T 50 
content (-50%) of many organisms. This A+T content plus 
the nature of the genetic code put clear constraints on the 
likliehood of occurence of any particular oligonucleotide 
sequence. Thus, a gene from E. coli with a 50% A+T content 
is much less likely to contain any particular A+T rich 55 
segment than a gene from B. thuringiensis. 

As described above, the expression of B.t toxin protein in 
plants has been problematic. Although the observations 
made in other systems described above offer the hope of a 
means to elevate the expression level of B.t toxin proteins 60 
in plants, the success obtained by the present method is quite 
unexpected. Indeed, inasmuch as it has been recentiy 
reported that expression of the full-length B.tk. toxin pro- 
tein in tobacco makes callus tissue necrotic (Barton et al., 
1987); one would reasonably expect that high level expres- 65 
sion of B.t toxin protein to be imattainable due to the 
reported toxicity effects. 
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In its most rigorous application, the method of the present 
invention involves the modification of an existing structural 
coding sequence ("structural gene") which codes for a 
particular protein by removal of ATTTA sequences and 
putative polyadenylation signals by site directed mutagen- 
esis of the DNA comprising the structural gene. It is most 
preferred that substantially all the polyadenylation signals 
and ATTTA sequences are removed although enhanced 
expression levels are observed with only partial removal of 
either of the above identified sequences. Alternately if a 
synthetic gene is prepared which codes for the expression of 
the subject protein, codons are selected to avoid the ATTTA 
sequence and putative polyadenylation signals. For purposes 
of the present invention putative polyadenylation signals 
include, but are not necessarily limited to, AATAAA, 
AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, 
ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, 
AAAATA. ATTAAA, AATTAA, AATACA and CATAAA. 
In replacing the ATTTA sequences and polyadenylation 
signals, codons are preferably utilized which avoid the 
codons which are rarely found in plant genomes. 

Another embodiment of the present invention, repre- 
sented in the flow diagram of FIG. 1, employs a method for 
the modification of an existing structural gene or alternately 
the de novo synthesis of a structural gene which method is 
somewhat less rigorous than the method first described 
above. Referring to FIG. 1, the selected DNA sequence is 
scanned to identify regions with greater than four consecu- 
tive adenine (A) or thymine (T) nucleotides. The A+T 
regions are scanned for potential plant polyadenylation 
signals. Although the absence of five or more consecutive A 
or T nucleotides eliminates most plant polyadenylation 
signals, if there are more than one of the minor polyadeny- 
lation signals identified within ten nucleotides of each other, 
then the nucleotide sequence of this region is preferably 
altered to remove these signals while maintaining the origi- 
nal encoded amino acid sequence. 

The second step is to consider the 15 to 30 nucleotide 
regions surrounding the A+T rich region identified in step 
one. If the A+T content of the surroimding region is less than 
80%, the region should be examined for polyadenylation 
signals. Alteration of the region based on polyadenylation 
signals is dependent upon (1) the number of polyadenylation 
signals present and (2) presence of a major plant polyade- 
nylation signal. 

The extended region is examined for the presence of plant 
polyadenylation signals. The polyadenylation signals are 
removed by site-directed mutagenesis of the DNA sequence. 
The extended region is also examined for multiple copies of 
the ATTTA sequence which are also removed by mutagen- 
esis. 

It is also preferred that regions comprising many con- 
secutive A+T bases or G+C bases are disrupted since these 
regions are predicted to have a higher likelihood to form 
hairpin structure due to self-complementarity. Therefore, 
insertion of heterogeneous base pairs would reduce the 
likelihood of self-complementary secondary structure for- 
mation which are known to inhibit transcription and/or 
translation in some organisms. In most cases, the adverse 
effects may be minimized by using sequences which do not 
contain more than five consecutive A+T or G+C. 

SYNTHETIC OUGONUCLEOTIDES FOR 
MUTAGENESIS 

The oli gonucleotides used in the mutagenesis are 
designed to maintain the proper amino acid sequence and 
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reading frame and preferably to not introduce common 
restriction sites such as Bglll, Hindin, SacI, Kpnl, EcoRI, 
Ncol, PstI and Sail into the modified gene. These restriction 
sites are found in multilinker insertion sites of cloning 
vectors such as plasmids pUCllS and pMON7258. Of 
course, the introduction of new polyadenylation signals, 
ATTTA sequences or consecutive stretches of more than five 
A+T or G+C, should also be avoided. The preferred size for 
the oligonucleotides is around 40-50 bases, but fragments 
ranging from 18 to 100 bases have been utilized. In most 
cases, a minimum of 5 to 8 base pairs of homology to the 
template DNA on both ends of the synthesized fragment are 
maintained to insure proper hybridization of the primer to 
the template. The oligonucleotides should avoid sequences 
longer than five base pairs A+T or G+C. Codons used in the 
replacement of wild-type codons should preferably avoid the 
TA or CG doublet wherever possible. Codons are selected 
from a plant preferred codon table (such as Table I below) 
so as to avoid codons which are rarely found in plant 
genomes, and efforts should be made to select codons to 20 
preferably adjust the G+C content to about 50%. 



TABLE I 



Preferred Codon Usage in Plants 



Amino Add 



Codon 



Percent Usage 
in Plants 



ARG 

LEU 

SER 

THR 
PRO 
ALA 
GLY 

njE 

VAL 

LYS 
ASN 
GLN 
HIS 



CGA 
CGC 
CGG 

ecu 

AGA 
AGG 
CUA 

cue 

CUO 
CUU 
UUA 
UUG 
UCA 
UCC 
UCG 
UCU 
AGC 
AGU 
ACA 
ACC 
ACG 
ACU 
CCA 
CCC 
CCG 

ecu 

GCA 
GCC 
GCG 
GCU 
GGA 
GGC 
GGG 
GGU 
AUA 
AUC 
AUU 
GUA 
GUC 
GUG 
GUU 
AAA 
AAG 
AAC 
AAU 
CAA 
CAG 
CAC 



7 
11 

5 
25 
29 
23 

8 
20 
10 
28 

5 
30 
14 
26 

3 
21 
21 
15 
21 
41 

7 
31 
45 
19 

9 
26 
23 
32 

3 
41 
32 
20 
11 
37 
12 
45 
43 

9 
20 
28 
43 
36 
64 
72 
28 
64 
36 
65 



25 



30 



35 



40 



45 



50 



55 



60 



65 
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TABLE I-continued 



Preferred Codon Usage in Plants 



10 



15 



Amino Acid 


Codon 


Percent Usage 
in Plants 




CAU 


35 


GLU 


GAA 


48 




GAG 


52 


ASP 


GAC 


48 




GAU 


52 


TYR 


UAC 


68 




UAU 


32 


CYS 


UGC 


78 




UGU 


22 


PHE 


UUC 


56 




UUU 


44 


MET 


AUG 


100 


TRP 


UGG 


100 



Regions with many consecutive A+T bases or G+C bases 
are predicted to have a higher likelihood to form hairxnn 
structures due to self-complementarity. Disruption of these 
regions by the insertion of heterogeneous base pairs is 
preferred and should reduce the likelihood of the formation 
of self-complementary secondary structures such as hairpins 
which are known in some organisms to inhibit transcription 
(transcriptional terminators) and translation (attenuators). 
However, it is dif&cult to predict the biological effect of a 
potential hairpin forming region. 

It is evident to those skilled in the art that while the above 
description is directed toward the modification of the DNA 
sequences of wild-type genes, the present method can be 
used to construct a completely synthetic gene for a given 
amino acid sequence. Regions with five or more consecutive 
A+T or G+C nucleotides should be avoided. Codons should 
be selected avoiding the TA and CG doublets in codons 
whenever possible. Codon usage can be normalized against 
a plant preferred codon usage table (such as Table I) and the 
G+C content preferably adjusted to about 50%. The result- 
ing sequence should be examined to ensure that there are 
minimal putative plant polyadenylation signals and ATTTA 
sequences. Restriction sites found in conmionly used clon- 
ing vectors are also preferably avoided. However, placement 
of several tmique restriction sites throughout the gene is 
useful for analysis of gene expression or construction of 
gene variants. 



Plant Gene Constmction 

The expression of a plant gene which exists in double- 
stranded DNA form involves transcription of messenger 
RNA (mRNA) from one strand of the DNA by RNA 
polymerase enzyme, and the subsequent processing of the 
mRNA piimary transcript inside the nucleus. This process- 
ing involves a 3' non-translated region which adds polyade- 
nylate nucleotides to the 3' end of the RNA. Transcription of 
DNA into mRNA is regulated by a region of DNA usually 
referred to as the '"promoter." The promoter region contains 
a sequence of bases that signals RNA polymerase to asso- 
ciate with the DNA and to initiate the transcription of mRNA 
using one of the DNA strands as a template to make a 
corresponding strand of RNA. 

A number of promoters which are active in plant cells 
have been described in the literature. These include the 
nopaline synthase (NOS) and octopine synthase (OCS) 
promoters (which are carried on tumor-inducing plasmids of 
Agrobacterium tumefaciens)^ the Cauliflower Mosaic Virus 
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(CaMV) 19S and 35S promoters, the light-inducible pro- 
moter from the small subunit of ribulose bis-phosphate 
carboxylase (ssRUBISCO, a very abundant plant polypep- 
tide) and the mannopine synthase (MAS) promoter (Velten 
et al. 1984 and Velten & Schell, 1985). All of these pro- 5 
meters have been used to create various types of DNA 
constructs which have been expressed in plants (see e.g., 
PCX publication WO84/02913 (Rogers ct al., Monsanto). 

Promoters which are known or are found to cause tran- 
scription of RNA in plant cells can be used in the present 
invention. Such promoters may be obtained from plants or 
plant viruses and include, but are not limited to, the 
CaMV35S promoter and promoters isolated from plant 
genes such as ssRUBISCO genes. As described below, it is 
preferred that the particular promoter selected should be ^5 
capable of causing sufficient expression to result in the 
production of an effective amount of protein. 

The promoters used in the DNA constructs (i.e. chimeric 
plant genes) of the present invention may be modiiied, if 
desired, to affect their control characteristics. For example, 
the CaMVBSS promoter may be ligated to the portion of the 
ssRUBISCO gene that represses the expression of 
ssRUBISCO in the absence of light, to create a promoter 
which is active in leaves but not in roots. The resulting 
chimeric promoter may be used as described herein. For 
purposes of this description, the phrase "CaMV35S" pro- 
moter thus includes variations of CaMV35S promoter, e.g., 
promoters derived by means of ligation with operator 
regions, random or controlled mutagenesis, etc. Further- 
more, the promoters may be altered to contain multiple 
"enhancer sequences" to assist in elevating gene expression. 

The RNA produced by a DNA constmct of the present 
invention also contains a 5' non-translated leader sequence. 
This sequence can be derived from the promoter selected to ^5 
express the gene, and can be specifically modified so as to 
increase translation of the mRNA. The 5' non-translated 
regions can also be obtained from viral RNA's, from suit- 
able eukaryotic genes, or from a synthetic gene sequence. 
The present invention is not limited to constructs, as pre- ^ 
sented in the following examples. Rather, the non-translated 
leader sequence can be part of the 5' end of the non- 
translated region of the coding sequence for the virus coat 
protein, or part of the promoter sequence, or can be derived 
from an unrelated promoter or coding sequence. In any case, 
it is preferred that the sequence flanking the initiation site 
conform to the translational consensus sequence rules for 
enhanced translation initiation reported by Kozak (1984). 

The DNA construct of the present invention also contains 
a modified or fully-synthetic structural coding sequence 50 
which has been changed to enhance the performance of the 
gene in plants. In a particular embodiment of the present 
invention the enhancement method has been applied to 
design modified and fully synthetic genes encoding the 
crystal toxin protein of Bacillus thuringiensis. Hie structural 55 
genes of the present invention may optionally encode a 
fusion protein comprising an amino-terminal chloroplast 
transit peptide or secretory signal sequence (see for instance. 
Examples 10 and 11). 

The DNA construct also contains a 3* non-translated 60 
region. The 3' non-translated region contains a polyadeny- 
lation signal which functions in plants to cause the addition 
of polyadenylate nucleotides to the 3' end of the viral RNA. 
Examples of suitable 3* regions are (1) the 3* transcribed, 
non-translated regions containing the polyadenylation signal 65 
of Agrobacterium tumor-inducing (H) plasmid genes, such 
as the nopaline synthase (NOS) gene, and (2) plant genes 
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like the soybean storage protein (7S) genes and the small 
subunit of the RuBP carboxylase ^9) gene. An example of 
a preferred 3' region is that from the IS gene, described in 
greater detail in the examples below. 

Plant Transformation 

A chimeric plant gene containing a structural coding 
sequence of the present invention can be inserted into the 
genome of a plant by any suitable method. Suitable plants 
for use in the practice of the present invention include, but 
are not linuted to, soybean, cotton, alfalfa, oilseed rape, flax, 
tomato, sugarbeet, sunflower, potato, tobacco, maize, rice 
and wheat. Suitable plant transformation vectors include 
those derived from a H plasmid of Agrobacterium tumefa- 
cienst as well as those disclosed, e.g., by Herrera-Estrella 
(1983), Bevan (1983), Klee (1985) and EPO publication 
120,516 (Schilperoort et al.). In addition to plant transfor- 
mation vectors derived from the Ti or root-inducing (Ri) 
plasmids of Agrobacterium, alternative methods can be used 
to insert the DNA constructs of this invention into plant 
cells. Such methods may involve, for example, the use of 
liposomes, electroporation, chemicals that increase free 
DNA uptake, free DNA delivery via microprojectile bom- 
bardment, and transformation using viruses or pollen. 

A particularly useful Ti plasmid cassette vector for trans- 
formation of dicotyledonous plants is shown in FIG, 5. 
Referring to FIG. 5, the expression cassette pMON893 
consists of the enhanced CaMV35S promoter (EN 35S) and 
the 3' end including polyadenylation signals from a soybean 
gene encoding the alpha-prime subunit of beta-conglycinin. 
Between these two elements is a multilinker containing 
multiple restriction sites for the insertion of genes. 

The enhanced CaMV35S promoter was constructed as 
follows. A fragment of the CaMV35S promoter extending 
between position —343 and+9 was previously constructed in 
pUC13 by Odell et al. (1985). This segment contains a 
region identified by Odell et al. (1985) as being necessary 
for maximal expression of the CaMV35S promoter. It was 
excised as a Oal-HindHI fragment, made blunt ended with 
DNA polymerase I (Klenow fragment) and inserted into the 
HincII site of pUC18. This upstream region of the 35S 
promoter was excised from this plasmid as a Hindm-EcoRV 
fragment (extending from -343 to -90) and inserted into the 
same plasmid between the HindlH and PstI sites. The 
enhanced CaMV35S promoter thus contains a duplication of 
sequences between-343 and-90 (Kay et al., 1987). 

The 3' end of the 7S. gene is derived from the 7S gene 
contained on the clone designated 17.1 (Schuler et al., 
1982). This 3* end firagment, which includes the polyadeny- 
lation signals, extends from an Avail site located about 30 bp 
upstream of the termination codon for the beta-conglycinin 
gene in clone 17.1 to an EcoRI site located about 450 bp 
downstream of this termination codon. 

Hie remainder of pMON893 contains a segment of 
pBR322 which provides an origin of replication in E. coli 
and a region for homologous recombination with the dis- 
armed T-DNA in Agrobacterium strain ACO (described 
below); the oriV region from the broad host range plasmid 
RKl; the streptomycin/spectinomycin resistance gene from 
Tn7; and a chimeric NPTII gene, containing the CaMV35S 
promoter and the nopaline synthase (NOS) 3' end, which 
provides kanamycin resistance in transformed plant cells. 

Referring to FIG. 6, transformation vector plasmid 
pMON900 is a derivative of pMON893. The enhanced 
CaMV35S promoter of pMON893 has been replaced with 
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the 1.5kb maimopine synthase (MAS) promoter (Velten et 
ai. 1984). The other segments are the same as plasmid 
pMON893. After incorporation of a DNA construct into 
plasmid vector pMON893 or pMON900, the intermediate 
vector is introduced into A. tumefaciens strain AGO which 5 
contains a disarmed Ti plasmid. Cointegrate H plasmid 
vectors are selected and used to transform dicotyledonous 
plants. 

Referring to FIG. 7. A, tumefaciens AGO is a disarmed 
strain similar to pTiB6SE described by Fraley et al. (1985). 10 
For construction of AGO the starting Agrobacterium strain 
was the strain A208 which contains a nopaline-type H 
plasmid. The H plasmid was disarmed in a manner similar 
to that described by Fraley et al. (1985) so that essentially all 
of the native T-DNA was removed except for the left border 15 
and a few hundred base pairs of T-DNA inside the left 
border. The remainder of the T-DNA extending to a point 
just beyond the right border was replaced with a novel piece 
of DNA including (from left to right) a segment of pBR322, 
the oriV region from plasmid RK2, and the kanamycin 20 
resistance gene from Tn601. The pBR322 and oriV seg- 
ments are similar to the segments in pMON893 and provide 
a region of homology for cointegrate formation. 

The following examples are provided to better elucidate 
the practice of the present invention and should not be ^ 
interpreted in any way to limit the scope of the present 
invention. Those skilled in the art will recognize that various 
modifications, truncations etc. can be made to the methods 
and genes described herein while not departing firom the 
spirit and scope of the present invention. 

Example 1 — ^Modified B.t.k. HD-1 Gene 

Referring to FIG. 2, the wild-type B.tk. HD-1 gene is 
known to be expressed poorly in plants as a full length gene 
or as a truncated gene. The G-tG content of the B.t.k. gene 
is low (37%) containing many A+T rich regions, potential 
polyadenylation sites (18 sites; see Table n for the list of 
sequences) and numerous ATTTA sequences. ^ 

TABI^n 



list of Secruences of the Potential 
Polyadenylation Signals 



AATAAA* 


AAGCAT 


AATAAT* 


ATTAAT 


AACCAA 


ATACAT 


ATATAA 


AAAATA 


AATCAA 


ATTAAA** 


ATACTA 


AATTAA** 


ATAAAA 


AATACA** 


ATGAAA 


CATAAA** 



*indicates a potential m^or plant polyadenylation site, 
'^indicates a potential minor animal polyadenylation site. 



All others are potential minor plant polyadenylation sites. 55 
l^le HI lists the synthetic oligonucleotides designed and 

synthesized for the site-directed mutagenesis of the B.Uk. 

HD-1 gene. 

TABLE m 60 

Mutagenesis Primers for B.Lk. HD-1 Gene 
Primer Length (bp) Sequence 

BTK185 18 TCCCXIAGATAArATCAAC ^_ 

BTK240 48 GGCTTGATTC CTAGCGAACT °^ 

CTTCGATTCT CTGGTTGATG 
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TABLE m-condnued 



Mutagenesis Primers for B.Uc. HD-1 Gene 


Primer 


Length (bp) 


SeQuenoe 






AGCTGTTC 




54 


TGGCAGCTTG AACGTACACG 
GAGAGGAGAGGAAC 


BTK669 


48 


AGTTAGTGTA AGCTCTCTTC 
TGAACTGGTT GTACCTGATC 
CAATCTCT 


BTK930 


39 


AGCCATGATC TGGTGACCGG 
ACCAGTAGTA TTCTCCTCT 


BTKlllO 


32 


AGTTGTTGGT TGTTGATCCC 
GATOTTAAAA GG 


BTK1380A 


37 


GTGATGAAGG GArGATGTTG 
TTGAACTCAG CACTACG 


BTKiaSOT 


100 


CAGAAGTTCC AGAGCCAAGA 
TTAGTAGACr TGGTGAGTGG 
GATTTGGGTG ArTTGTGATG 
AAGGGAIGAT GTTGTTGAAC 
TCAGCACTAC GATGTATCCA 


BTK1600 


27 


TGATGTGTGG AACTGAAGGT 
TTGTGGT 



The B.tk. HD-1 gene (Bgin fragment from pMON9921 
encoding amino acids 29-607 with a Met-Ala at the N-ter- 
minus) was cloned into pMON7258 (pUCllS derivative 
which contains a Bgin site in the multilinker cloning region) 
at the Bgin site resulting in pMON5342. The orientation of 
the B.t.k. gene was chosen so that the opposite strand 
(negative strand) was synthesized in filamentous phage 
particles for the mutagenesis. Hie procedure of Kunkle 
(1985) was used for the mutagenesis using plasmid 
pMON5342 as starting material. 

The regions for mutagenesis were selected in the follow- 
ing manner. All regions of the DNA sequence of the B.tk. 
gene were identified which contained five or more consecu- 
tive base pairs which were A or T. These were ranked in 
terms of length and highest percentage of A+T in the 
surrounding sequence over a 20-30 base pair region. The 
DNA was then analysed for regions which might contain 
polyadenylation sites (see Table II above) or ATTTA 
sequences. Oligonucleotides were designed which maxi- 
mized the elimination of A+T consecutive regions which 
contained one or more polyadenylation sites or ATTTA 
sequences. Two potential plant polyadenylation sites were 
rated more critical (see Table II) based on published reports. 
Codons were selected which increased G+C content, did not 
generate restriction sites for enzymes useful for cloning and 
assembly of the modified gene (BamHI, Bgin, Sad, Ncol, 
EcoRV) and did not contain the doublets TA or GC which 
have been reported to be infrequently found in codons in 
plants. The oligonucleotides were at least 18 bp long ranging 
up to 100 base pairs and contained at least 5-8 baise pairs of 
direct homology to native sequences at the ends of the 
firagments for efBlcient hybridization and priming in site- 
directed mutagenesis reactions. FIG. 2 compares the wild- 
type B.t.k. HD-1 gene sequence with the sequence which 
resulted from the modifications by site-directed mutagen- 
esis. 

The end result of these changes was to increase the G+C 
content of B.tk. gene from 37% to 41% while also decreas- 
ing the potential plant polyadenylation sites from 1 8 to 7 and 
decreasing the ATTTA regions from 13 to 7. Specifically, the 
mutagenesis changes from amino (5*) terminus to the car- 
boxy (3*) terminus are as follows: 

BTK185 is an 18-mer used to eliminate a plant polyade- 
nylation site in the midst of a nine base pair region of A+T. 
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BTK240 is a 48-mer. Seven base pairs were changed by 
this oligonucleotide to eliminate three potential polyadeny- 
lation sites (2 AACCAA, 1 AATTAA). Another region close 
to the region altered by BTK240. starting at bp 312. had a 
high A+T content (13 of 15 base pairs) and an ATTTA 
region. However, it did not contain a potential polyadeny- 
lation site and its longest string of uninterrupted A+T was 
seven base pairs. 

BTK462 is a 54-mer introducing 13 base pair changes. 
The first six changes were to reduce the A+T richness of the 
gene by replacing wild-type codons with codons containing 
G and C while avoiding the CG doublet. The next seven 
changes made by BTK462 were used to clhninate an A+T 
rich region (13 of 14 base pairs were A or T) containing two 
ATTTA regions. 

BTK669 is a 48-mer making nine individual base pair 
changes eliminating three possible polyadenylation sites 
(ATATAA, AATCAA, and AATTAA) and a single ATTTA 
site. 

BTK930 is a 39-mer designed to increase the G+C 
content and to elinMate a potential polyadenylation site 
(AATAAT - a major site). This region did contain a nine base 
pair region of consecutive A+T sequence. One of the base 
pair changes was a G to A because a G at this position would 
have created a G+C rich region (CX:GG(G)C). Since 
sequencing reactions indicate that there can be difficulties 
generating sequence through G+C consecutive bases, it was 
thought to be prudent to avoid generating potentially prob- 
lematic regions even if they were problematic only in vitro. 

BTKl 1 10 is a 32-mer designed to introduce five changes 
in the wild-type gene. One potential site (AATAAT - a major 
site) was eliminated in the midst of an A+T rich region (19 
of 22 base pairs). 

BTK1380A and BTK1380T are responsible for 14 indi- 
vidual base pair changes. The first region (1380A) has 17 
consecutive A+T base pairs. In this region is an ATTTA and 
a potential polyadenylation site (AATAAT). The l(X)-irier 
(1380T) contains all the changes dictated by 1380A. The 
lai;ge size of this primer was in part an experiment to 
determine if it was feasible to utilize large oligonucleotides 
for mutagenesis (over 60 bases in length). A second con- 
sideration was that the 100-mer was used to mutagenize a 
template which had previously been mutageneized by 
1380A. The original primer ordered to mutagenize the 
region downstream and adjacent to 1380A did not anneal 
efficiently to the desired site as indicated by an inability to 
obtain clean sequence utilizing the primer. The large region 
of homology of ISSOT did assure proper armealing. The 
extended size of 1380T was more of a convenience rather 
than a necessity. The second region adjacent to 1380A 
covered by 1380T has a high A+T content (22 of 29 bases 
are A or T). 

BTKl 600 iS a 27-mer responsible for five individual base 
pair changes. An AFTTA region and a plant polyadenylation 
site were identified and the appropriate changes engineered. 

A total of 62 bases were changed by site-directed 
mutagenesis. The G+C content increased by 55 base pairs, 
the potential polyadenylation sites were reduced from 18 to 
seven and the ATTTA sequences decreased from 13 to seven. 
The changes in the DNA sequence resulted in changes in 55 
of the 579 codons in the mmcated B.tk. gene in pMON5342 
(approximately 9.5%). 

Referring to Table IV modified B.tk. HD-1 genes were 
constructed that contained all of the above modifications 
(pMON5370) or various subsets of individual modifications. 
These genes were inserted into pMON893 for plant trans- 
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formation and tobacco plants containing these genes were 
analyzed. The analysis of tobacco plants with the individual 
modifications was undertaken for several reasons. Expres- 
sion of the wild type truncated gene in tobacco is very poor, 
resulting in infrequent identification of plants toxic to THW. 
Toxicity is defined by leaf feeding assays as at least 60% 
mortality of tobacco homwoim neonate larvae with a dam- 
age rating of 1 or less (scale is 0 to 4; 0 is equivalent to total 
protection, 4 total .damage). The modified HD-1 gene 
(pMON5370) shows a large increase in expression (esti- 
mated to be approximately 100-fold; see T^ble Vni) in 
tobacco. Therefore, increases in expression of the wild-type 
gene due to indidvidual modifications would be apparently 
a large increase in the frequency of toxic tobacco plants and 
the presence of detectable B.t.k. protein. Results are shown 
in the following table: 

TABLE IV 

Relative efifects of Regional Modifications 
within the B.Lk. Gene 







#of 


#of 


Construct 


Position Modified 


Plants 




Toxic Plants 








pMON5370 


185, 240, 669. 930. 


38 


22 




1110. 1380a4-b. 1600 






pMON10707 


185. 240, 462. 669 


48 


19 


pMON10706 


930. 1110. 1380a^b. 1600 


43 


1 


pMON10539 


185 


55 


2 


pMON10537 


240 


57 


17 


pMON10540 


185, 240 


88 


23 


pMON10705 


462 


47 


1 



The eflFects of each individual oligonucleotides' changes 
on expression did reveal some overall trends. Six different 
constructs were generated which were designed to identify 
the key regions. The nine different oligonucleotides were 
divided in half by their position on the gene. Changes in the 
N- terminal half were incorporated into pMON 10707 (185, 
240, 462,669). C-terminal half changes were incorporated 
into pMON10706 (930, 1110, 1380a+b, 1600). The results 
of analysis of plants with these two constructs indicate that 
pMON10707 produces a substantial number of toxic plants 
(19 of 48). Protein from these plants is detectable by ELISA 
analysis. pMON 10706 plants were rarely identified as insec- 
ticidal (1 of 43) and the levels of B.Lk. were barely detect- 
able by immunological analysis. Investigation of the N-ter- 
minn1 changes in greater detail was done with 4 pMON 
constructs; 10539 (185 alone), 10537 (240 alone), 10540 
(185 and 240) and 10705 (462 alone). The results indicate 
that the presence of the changes in 240 were required to 
generate a substantial number of toxic plants (pMON10540; 
23 of 88, pMON10537; 17 of 57). The absence of the 240 
changes resulted in a low firequency of toxic plants with low 
B.tk. -protein levels, identical to results with the wild type 
gene. Hiese results indicate that the changes in 240 are 
responsible for a substantial increase in expression levels 
over an analogous wild-type construct in tobacco. Changes 
in additional regions (185,462,669) in conjunction with 240 
may result in increases in B.tk expression (>2 fold). How- 
ever, changes at the 240 region of the N-terminal portion of 
the gene do result in dramatic increases in expression. 

Despite the importance of the alteration of the 240 region 
in expression of modified genes, increased expression can be 
achieved by alteration of other regions. Hybrid genes, part 
wild-type, part synthetic, were generated to determine the 
effects of synthetic gene segments on the levels of B.tk. 
expression. A hybrid gene was generated with a synthetic 
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N-tenninal third (base pair 1 to 590 of FIG. 2: to the Xbal 
site) with the C-terminal wild type B.tk, HE>-1 
(pMON5378) Plants transformed with this vector were as 
toxic as plants transformed with the modiiied HD-1 gene 
(pMON5370). This is consistent with the alteration of the 5 
240 region. However, pMON10538, a. hybrid with a wild- 
type N-terminal third (wild type gene for the first 600 base 
pairs, to the second Xbal site) and a synthetic C-terminal last 
two-thirds (base pair 590 to 1845 of FIG. 3 was used to 
transform tobacco and resulted in a dramatic increase in 10 
expression. The levels of expression do not appear to be as 
high as those seen with the synthetic gene, but are compa- 
rable to the modified gene levels. These results indicate that 
modification of the 240 segment is not essential to increased 
expression since pMON10538 has an intact 240 region. A 15 
fully synthetic gene is, in most cases, superior for expression 
levels of B.t.k. (See Example 2.) 

Example 2— Fully Synthetic B.Lk, HD-1 Gene 

20 

A synthetic B.Lk. HD-1 gene was designed using the 
preferred plant codons listed in Table V below. Table V lists 
the codons and frequency of use in plant genes of dicoty- 
ledonous plants compared to the frequency of their use in the 
wild type B.Lk. HD-1 gene (amino acids 1-615) and the ^ 
synthetic gene of this example. The total number of each 
amino acid in this segment of the gene is listed in the 
parenthesis under the amino acid designated. 

TABLE V 

30 

Codon in Usage Synthetic B.tk. HD-1 Gene 



Percent Usage in 
Annno Add Codon Plants/Wt B.tkVSyn 



ARG 


CGA 


7 


11 


2 


(43) 


CGC 


11 


5 


5 




CGG 


5 


2 


0 




CGU 


25 


14 


27 




AGA 


29 


55 


41 




AGG 


23 


14 


25 


LEU 


CUA 


S 


16 


4 


(49) 


cue 


20 


0 


20 




CUG 


10 


2 


6 




CUU 


28 


22 


24 




UUA 


5 


50 


0 




UUG 


30 


10 


45 


SER 


uc:a 


14 


27 


5 


(64) 


UCC 


26 


9 


28 




UCG 


3 


8 


0 




ucu 


21 


19 


31 




AGC 


21 


6 


32 




AGU 


15 


31 


5 


THR 


ACA 


21 


31 


14 


(42) 


ACC 


41 


19 


53 




ACG 


7 


14 


0 




ACU 


31 


36 


33 


PRO 


CCA 


45 


35 


53 


(34) 


cx:c 


19 


6 


12 




CCG 


9 


21 


3 




ecu 


26 


38 


32 


ALA 


GCA 


23 


38 


26 


(31) 


GCC 


32 


9 


29 




GCG 


3 


3 


0 




GCXJ 


41 


50 


45 


GLY 


GGA 


32 


52 


45 


(46) 


GGC * 


20 


17 


15 




GGG 


11 


15 


6 




GGU 


37 


15 


34 


ILE 


AUA 


12 


39 


2 


(46) 


AUC 


45 


11 


67 




AUU 


43 


50 


30 


VAL 


GUA 


9 


45 


3 


(38) 


GUC 


20 


5 


16 




GUG 


28 


11 


37 



40 



45 



60 



65 
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TABLE V>continued 

Codon in Usage Synthetic B.tJc HD-1 Gene 



Percent Usage in 
Amino Add Codon Plants/Wt B.tJc/Syn 





GUU 


43 


39 


45 


LYS 


AAA 


36 


100 


33 


C3) 


AAG 


64 


0 


67 


ASN 


AAC 


72 


27 


80 


(44) 


AAU 


28 


73 


20 


GLN 


CAA 


64 


77 


61 


(31) 


CAG 


36 


23 


39 


HIS 


CAC 


65 


0 


80 


(10) 


CAU 


35 


100 


20 


GLU 


GAA 


48 


87 


50 


(30) 


GAG 


52 


13 


50 


ASP 


GAC 


48 


17 


65 


(23) 


GAU 


52 


83 


35 


TYR 


UAC 


68 


20 


72 


(25) 


UAU 


32 


80 


28 


crYS 


UGC 


78 


50 


100 


(2) 


UGU 


22 


50 


0 


PHE 


UUC 


56 


17 


83 


(36) 


UUU 


44 


83 


17 


MET 


AUG 


100 


100 


100 


(9) 










TRP 


UGG 


100 


100 


100 


(9) 











The resulting synthetic gene lacks ATTTA sequences, 
contains only one potential polyadenylation site and has a 
G+C content of 48.5%. FIG. 3 is a comparison of the 
wad-type HD-1 sequence to the synthetic gene sequence for 
amino acids 1-615. There is approximately 77% DNA 
homology between the synthetic gene and the wild-type 
gene and 356 of the 615 codons have been changed 
(approximately 60%). 

Example 3 — Synthetic B.t.k, HD-73 Gene 

The crystal protein toxin from B.tlc HD-73 exhibits a 
higher unit activity against some important agricultural 
pests. The toxin protein of HD-1 and HD73 exhibit substan- 
tial homology (-90%) in the N-tenninal 450 amino acids, 
but differ substantially in the amino acid region 451-615. 
Fusion proteins comprising amino acids 1-450 of HD-1 and 
451-615 of HD73 exhibit the insecticidal properties of the 
wild-type HD-73. The strategy employed was to use the 
5'-two thirds of the synthetic HD-1 gene (first 1350 bases, up 
to the Sad site) and to dramatically modify the final 590 
bases (through amino add 645) of the HD-73 in a manner 
consistent with the algorithm used to design the synthetic 
HD-1 gene. Table VI below lists the oligonucleotides used 
to modify the HD-73 gene in the order used in the gene from 
5' to 3' end. Nine oligonucleotides were used in a 590 base 
pair region, each nucleotide ranging in size from 33 to 60 
bases. Hie only regions left unchanged were areas where 
there were no long consecutive strings of A or T bases 
Oonger than six). All polyadenylation sites and ATTTA sites 
were eliminated. 

TABLE VI 

Mutagenesis Primers for B.tk. HD-73 
Primer Length (bp) Sequence 

73K1363 51 AATACTATCG GATGCGATGA 

TGTTGTTGAA CTCAGCACTA 
CGGTGTATCC A 

73K1437 33 TCCTGAAAFG ACAGAACCGT 
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TABLE Vl-continued 



Mutagenesis Primera for B.Lk. HP-73 



Primer 


Length (bp) 


Sequence 






TGAAGAGAAA GTT 


73K1471 


48 


ATTTCXZACTG CTGTTGAGTC 
TAACXjAGGTC TCCACCAGTG 
aatcctgg 


73K1561 


60 


GTGAATAGGG GTCACAGAAG 

catacctcac acgaactcta 
tatctggtag atgttggatgg 


73K1642 . 


33 


TGTAGCrrGGA ACTGTATTGG 
AGAAGATGGATGA 


73K1675 


48 


TTCAAAGTAA CCGAAATCGC 
TGGATTGGAG ATTATCCAAG 
GAGGTAGC 


73K1741 


39 


actaaagttt ctaacaccx:a 

CGATGTTACC GACjTGAAG A 


73K1797 


36 


AACrCGAATG AACTCXjAATC 

tgtcgataat cactcc 


73KTERM 


54 


GGACACTAGA TCTTAGTGAT 
AATCGGTCAC ATTTGTCTTG 
AGTCCAAGCT GGTT 



The resulting gene has two potential polyadenylalion sites 
(compared to 18 in the WT) and no ATTTA sequence (12 in 
the WT). The G+C content has increased from 37% to 48%. ^ 
A total of 59 individual base pair changes were made using 
the primers in Table VI. Overall, there is 90% DNA homol- 
ogy between the region of the HD-73 gene modified by site 
directed mutagenesis and the wild-type sequence of the 
analogous region of HD-73. The synthetic HI>-73 is a hybrid 
of the first 1360 bases from the synthetic HD-1 and the next 
590 bases or so modified HD-73 sequence. FIG. 4 is a 
comparison of the above-described synthetic B.t.k. HD-73 
and the wild-type B.tJc HD-73 encoding amino acids 
1-645. In the modified region of the HD-73 gene 44 of the 
170 codons (25%) were changed as a result of the site- 
directed mutagenesis changes resulting firom the oligonucle- 
otides found in Table VI. Overall, approximately 50% of the 
codons in the synthetic B.Lk. HD-73 differ firom the analo- 
gous segment of the wild-type and HD-73 gene. ^ 

A one base pair deletion in the synthetic HD-73 gene was 
detected in the course of sequencing the 3* end at base pair 
1890. This results in a frame-shift mutation at amino acid 
625 with a premature stop codon at amino acid 640 
(pMON5379). Tbble VII below compares the codon usage of 
die wild-type gene of B.Lk. HD-73 versus the synthetic gene 
of this example for amino adds 451-645 and codon usage 
of naturally occurring genes of dicotyledonous plants. Tlie 
total number of each amino add encoded in this segment of 
the gene is found in the parentheses under the amino acid 
designation. 

TABLE Vn 

Codoa Usage in Synthetic B.Llc HD-73 Gene 55 



Percent Usage in 
Amino Acid Codon E*lanis/Wt HD-73/Syn 



ARG 


CGA 


7 


10 


0 


(10) 


CGC 


11 


0 


8 




CGG 


5 


10 


0 




CGU 


25 


20 


23 




AGA 


29 


60 


62 




AGG 


23 


0 


8 


LEU 


CXIA 


8 


25 


8 


(12) 


cue 


20 


17 


58 




CUG 


10 


17 


8 




CUU 


28 


8 


0 



60 



65 



22 

TABLE Vn-continued 
Codon Usage in Synthetic B.t-k. HD-73 Gene 



Percent Usage in 
Amino Add Codon Plants/Wt HD-73/Syn 





UUA 


5 


33 


8 




UUG 


30 


0 


17 


SER 


UCA 


14 


24 


18 


(21) 


UCC 


26 


10 


27 




UCG 


3 


10 


0 




UCU 


21 


24 


18 




AGC 


21 


0 


14 




AGU 


15 


33 


23 


THR 


ACA 


21 


47 


38 


(15) 


ACC 


41 


13 


31 




ACG 


7 


13 


0 




ACU 


31 


27 


31 


PRO 


CCA 


45 


71 


71 




CCC 


19 


0 


0 




CCG 


9 


14 


0 




ecu 


26 


14 


29 


ALA 


GCA 


23 


29 


31 




GCC 


32 


7 


8 




GCG 


3 


21 


15 




GCU 


41 


43 


46 


GLY 


GGA 


32 


33 


43 




GGC 


20 


0 


0 




GGG 


11 


27 


14 




GGU 


37 


40 


43 


DLE 


AUA 


12 


33 


7 


(15) 


AUC 


45 


7 


40 




AUU 


" 43 


60 


53 


VAL 


GUA 


9 


40 


7 


(15) 


GUC 


20 


0 


7 




GUG 


28 


20 


36 




GUU 


43 


40 


50 


LYS 


AAA 


36 


67 


100 


(3) 


AAG 


64 


33 


0 


ASN 


AAC 


72 


20 


53 


(20) 


AAU 


28 


80 


47 


GLN 


CAA 


64 


60 


67 


(5) 


CAG 


36 


40 


33 


HIS 


CAC 


65 


67 


100 


(3) 


CAU 


35 


33 


0 


GLU 


GAA 


48 


86 


57 


a) 


GAG 


52 


14 


43 


ASP 


GAC 


48 


40 


50 


(5) 


GAU 


52 


60 


50 


TYR 


UAC 


68 


0 


20 


(5) 


UAU 


32 


100 


80 


CYS 


UGC 


78 


0 


0 


(0) 


UGU 


22 


0 


0 


PHE 


UUC 


56 


8 


67 


(13) 


uuu 


44 


92 


33 


MET 


AUG 


100 


100 


lOO 


(2) 










TRP 


UGG 


100 


100 


100 



(2) 



Another truncated synthetic HD-73 gene was constructed. 
The sequence of this synthetic HD-73 gene is identical to 
that of the above synthetic HD-73 gene in the region in 
which they overlap (amino acids 29-615), and it also 
encodes Met-Ala at the N-terminus. FIG. 8 shows a com- 
parison of this truncated synthetic HD-73 gene with the 
N-terminal Met-Ala versus the wild-type HD-73 gene. 

While the previous examples have been directed at the 
preparation of synthetic and modified genes encoding trun- 
cated B.tiL proteins, synthetic or modified genes can also be 
prepared which encode fiill length toxin proteins. 

One full length B.Llc gene consists of the synthetic 
HD-73 sequence of FIG. 4 from nucleotide 1-1845 plus 
wild-type HD-73 sequence encoding amino acids 616 to the 
C-terminus of the native protein. FIG. 9 shows a comparison 
of this synthetic/wild-type full length HD-73 gene versus the 
wild-type full length HD-73 gene. 
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Another full length Bxk. gene consists of the synthetic 
HD-73 sequence of FIG. 4 from nucleotide 1-1845 plus a 
modified HP-73 sequence ending amino acids 616 to the 
C-terminus of the native protein. The C-terminal portion has 
been modified by site-diiected mutagenesis to remove puta- 
tive polyadenylation signals and ATTTA sequences accord- 
ing to the algorithm of FIG. 1. FIG. 10 shows a comparison 
of this synthetic/modified full length HD-73 gene versus the 
wild-type full length HD-73 gene. 

Another full length B.tk. gene consists of a fully synthetic 
HD-73 sequence which incorporates the synthetic HD-73 
sequence of FIG. 4 from nucleotide 1-845 plus a synthetic 
sequence encoding amino acids 1 6 to the C-terminus of the 
native protein. The C-terminal synthetic portion has been 
designed to eliminate putative polyadenylation signals and 
ATTTA sequences and to include plant preferred codons. 
FIG. 11 shows a comparison of this fully synthetic full 
length HD-73 gene versus the wild-type full length HD-73 
gene. 

Alternatively, another full length B.tk. gene consists of a 
fully synthetic sequence comprising base pairs 1-1830 of 
B.t.k. HD-1 (FIG. 3) and base pairs 1834-3534 of B.Lk. 
HD-73 (FIG. 11). 

Example A — Expression of Modified and Synthetic 
B,C^ HD-1 and Synthetic HD-73 

A number of plant transformation vectors for the expres- 
sion of B.t.k. genes were constructed by incorporating the 
stmctural coding sequences of the previously described 
genes into plant transformation cassette vector pMON893. 
The respective intermediate transformation vector is 
inserted into a suitable disarmed Agrobacterium vector such 
as A. tumefaciens ACO, supra. Ussue explants are cocul- 
tured with the disarmed Agrobacterium vector and plants 
regenerated imder selection for kanamycin resistance using 
known protocols: tobacco (Horsch et al., 1985); tomato 
(McCormick et al.. 1986) and cotton (Trolinder et al., 1987). 
a) Tobacco. 

The level of B.Lk HD-1 protein in transgenic tobacco 40 
plants containing pMON9921 (wild type truncated), 
pMON5370 (modified HD-1, Example 1, FIG. 2) and 
pMON5377 (synthetic HD-1, Example 2, FIG. 3) were 
analyzed by Western analysis. Leaf tissue was frozen in 
liquid nitrogen, ground to a fine powder and then ground in 45 
a 1:2 (wt: volume) of SDS-RAGE sample buffer. Samples 
were firozen on dry ice. then incubated for 10 minutes in a 
boiling water bath and microfiiged for 10 minutes. The 
protein concentration of the supernatant was determined by 
the method of Bradford (Anal. Biochem. 72:248-254). Fifty 50 
ug of protein was run per lane on 9% SDS-PAGE gels, the 
protein transferred to nitrocellulose and the B.tk. HD-1 
protein visualized using antibodies produced against B.tk 
HD-1 protein as the primary antibody and alkaline phos- 
phatase conjugated second antibody as described by the 
manufacturer (Promega, Madison, WI). Purified HD-1 tryp- 
tic fragment was used as the control. Whereas the B.tk. 
protein from tobacco plants containing pMON9921 was 
below the level of detection, the B.tk protein from plants 
containing the modified (pMON5370) and synthetic 
(pMON5377) genes was easily detected. The B.tk. protein 
from plants containing pMON9921 remained undetectable, 
even with 10 fold longer incubation times. The relative 
levels of B.tk. HD-1 protein in these plants is estimated in 
TEible Vm. Because the protein from plants containing 
pMON9921 was not observed, the level of protein in these 
plants was estimated from the relative mRNA levels (see 
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15 



20 



25 



30 



35 



below). Plants containing the modified gene (pMON5370) 
expressed approximately 100 fold more B.tk. protein than 
plants containing the wild-type gene (pMON9921). Plants 
containing the fully synthetic B.tJt HD-1 gene 
(pMON5377) expressed approximately five fold more pro- 
tein than plants containing the modified gene. The modified 
gene contributes the majority of the increase in B.tk. 
expression observed. The plants used to generate the above 
data are the best representatives from each construct based 
either on a tobacco homworm bioassay or on data derived 
from previous Western analysis. 

TABLE vm 

Expression of B.tk. HD-1 Protein 
in Transgenic Tobacco 



Gene 

Description 



Vector 



B,tk. Protein* 
Concentration 



Fold Increase 
in B.Lk. 
Expression 



WUdtype 
Modified 
Synthetic 



pMON9921 
pMON5370 
pMONS377 



10 
1000 
5000 



1 

100 
500 



*B.Lk. protein concentrations are expressed in ng/mg of total soluble protein. 
The level of B.Lk. protein for plants containing the wild type gene arc 
estimated finom mRNA levels. 

Plants containing these genes were tested for bioactivity 
to determine whether the increased quantities of protein 
observed by Western analysis result in a corresponding 
increase in bioactivity. Leaves from the same plants used for 
the Western data in Table 1 were tested for bioactivity 
against two insects. A detached leaf bioassay was first done 
using tobacco homworm, an extremely sensitive lepi- 
dopteran insect Leaves firom all three transgenic tobacco 
plants were totally protected and 100% mortality of tobacco 
homworm observed (see Table IX below). A much less 
sensitive insect, beet aimywonn, was then used in another 
detached leaf bioassay. Beet armyworm is approximately 
500 fold less sensitive to B.tk. HD-1 protein than tobacco 
homworm. The difference in sensitivity of these two insects 
was determined using purified HD-1 protein in a diet incor- 
poration assay (see below). Plants containing the wild-type 
gene (pMON9921) showed only minimal protection against 
beet armyworm, whereas plants containing the modified 
gene showed almost complete protection and plants con- 
taining the fully synthetic gene were totally protected 
against beet armyworm damage^ The results of these bioas- 
says confirm the levels of B.tk. HD-1 expression observed 
in the Western analysis and demonstrates that the increased 
levels of B.tk. HD-l protein correlates with increased 
insecticidal activity. 

TABLE DC 



55 



60 



65 



Protection of Tobacco Plants fiom 
Tobacco Homworm and Beet Armywomi 


Gene 

Description 


Vector 


Tobacco Homworm 
Damage* 


Beet Armyworm 
Damage* 


None 


None 


NL 


NL 


Wld type 


pMON9921 


0 


3 


Modified 


pMON5370 


0 


1 


Synthetic 


pMON53T7 


0 


0 



*Extcnt of insect damage was rated: 0. no damage; t slight; % moderate; 3, 
severe; or NL, no leaf left 

The bioactivity of the B.tk., HD-1 protein produced by 
these transgenic plants was further investigated to more 
accurately quantitate the relative activities. Leaf tissue firom 
tobacco plants containing the wild-type, modified and syn- 
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thetic genes were ground in 100 mM sodium carbonate 
buffer, pH 10 at a 1 :2 (wtrvol) ratio. Particulate material was 
removed by centrifugation. The supernatant was incorpo- 
rated into a synthetic diet similar to that described by 
Marrone et al. (1985). The diet medium was prepared the 5 
day of the test with the plant extract solutions incorporated 
in place of the 20% water component One ml of the diet was 
aliquoted into 96 well plates. 

After the diet dried, one neonate tobacco budworm larva 
was added to each well. Sixteen insects were tested with lO 
each plant sample. The plants were incubated at 270" C. 
After seven days, the larvae firom each treatment were 
combined and weighed on an analytical balance. The aver- 
age weight per insect was calculated and compared to a 
standard curve relating B.tk. protein concentrations to aver- 15 
age larval weight. Insect weight was inversely proportional 
(in a logarithmic manner) to the relative increase in B.tk. 
protein concentration. The amount of B.tk. HD-1 protein, 
based on the extent of larval growth inhibition was deter- 
mined for two different plants containing each of the three 20 
genes. The specific activity (ng of B.tk. HD-1 per mg of 
plant protein) was determined for each plant Plants con- 
taining the modified HD-1 gene (pMON5370) averaged 
approximately 1400 ng (1200 and 1600 ng) of B.tk. HD-1 
per mg of plant extract protein. This value compares closely 25 
with the 1000 ng of B.tJc. HD-1 protein per mg of plant 
extract protein as determined by Western analysis (Ikble I). 
B.tk. HD-1 concentrations for the plants containing the 
synthetic HD-1 gene averaged approximately 8200 ng (7200 
and 9200 ng) of B.tk. HD-1 protein per mg of plant extract 30 
protein. This number compares well to the 5000 ng of HD-1 
protein per mg of plant extract protein estimated by Western 
analysis. Likewise, plants containing the synthetic gene 
showed approximately a six-fold higher specific activity 
than the corresponding plants containing the modified gene 35 
for these bioassays. In the Western analysis the ratio was 
approximately 10 fold, again both are in good agreement 
The level of B.t.k. protein in plants containing the wild-type 
HD-1 gene (pMON9921) was too low to give a significant 
decrease in larval weight and hence was below a level that 40 
could be quantitated in this assay. In conclusion, the levels 
of B.tk. HD-1 protein determined by both the bioassays and 
the Western analysis for these plants containing the modified 
and synthetic genes agree, which demonstrates that the B.tk. 
HD- 1 protein produced by these plants is biologically active. 45 

The levels of mRNA were determined in the plants 
containing the wild-type B.tk. HD-1 gene (pMON9921) and 
the modified gene Q)MON5370) to establish whether the 
increased levels of protein production result from increased 
transcription or translation. mRNA from plants containing 50 
the synthetic gene could not be analyzed directly with the 
same DNA probe as used for the wild-type and modified 
genes because of the numerous changes made in the coding 
sequence. mRNA was isolated and hybridized with a single- 
stranded DNA probe homologous to approximately the 5 '90 55 
bp of the wild-type or modified gene coding sequences. The 
hybrids were digested with SI nuclease and the protected 
probe fragments analyzed by gel electrophoresis. Because 
the procedure used a large excess of probe and long hybrid- 
ization time, the amount of protected probe is proportional 60 
to the amount of B.tk. mRNA present in the sample. Two 
plants expressing the modified gene (pMON5370) were 
found to produce up to ten-fold more RNA than a plant 
expressing the wild-type gene (pMON9921). 

The increased mRNA level from the modified gene is 65 
consistent with the result expected from the modifications 
introduced into this gene. However, this 10 fold increase in 
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mRNA with the modified gene compared to the wild- type 
gene is in contrast to the 100 fold increase in B.t.k. protein 
from these genes in tobacco plants. If the two mRNAs were 
equally well translated then a 10 fold increase in stable 
mRNA would be expected to yield a 10 fold increase in 
protein- The higher increase in protein indicates that the 
modified gene mRNA is translated at about a 10 fold higher 
efficiency than wild-type. Thus, about half of the total effect 
on gene expression can be explained by changes in mRNA 
levels and about half to changes in translational efficiency. 
This increase in translational efficiency is striking in that 
only about 9.5% of the codons have been changed in the 
modified gene; that is, this effect is clearly not due to 
wholesale codon usage changes. The increased translational 
efficiency could be due to changes in mRNA secondary 
structure that affect translation or to the removal of specific 
translational blockades due to specific codons that were 
changed. 

The increased expression seen with the synthetic HD-1 
gene was also seen with a synthetic HD-73 gene in tobacco. 
B.tk. HD-73 was undetected in extracts of tobacco plants 
containing the wild-type truncated HD-73 gene 
(pMON5367), whereas B.tk. HD-73 protein was easily 
detected in extracts from tobacco plants containing the 
synthetic HD-73 gene of FIG. 4 (pMON5383). Approxi- 
mately 1000 ng of B.tk. HD-73 protein was detected per mg 
of total soluble plant protein. 

As described in Example 3 above, the B.tk. HD-73 
protein encoded in pMON5383 contains a small C- terminal 
extension of amino acids not encoded in the wild-type 
HD-73 protein. Tbese extra amino acids had no effect on 
insect toxicity or on increased plant expression. A second 
synthetic HD-73 gene was constructed as described in 
Example 3 (FIG. 8) and used to transform tobacco. 
(pMON5390). Analysis of plants containing pMON5390 
showed that this gene was expressed at levels comparable to 
that of pMON5383 and that these plants had similar insec- 
ticidal efficacy. 

In tobacco plants the synthetic HD-1 gene was expressed 
at approximately a 5-fold higher level than the synthetic 
HI>73 gene. However, this synthetic HD-73 gene still was 
expressed at least 100-fold better than the wild-type HD-73 
gene. The HD-73 protein is approximately 5-fold more toxic 
to many insect pests than the HD-1 protein, so both synthetic 
HD-1 and HD-73 genes provide approximately comparable 
insecticidal efficacy in tobacco. 

The full length B.t.k. HD-73 genes described in Example 
3 were also incorporated into the plant transformation vector 
pMON893 so that they were expressed from the En 35S 
promoter. The synthetic/wild-type full length HD-73 gene of 
FIG. 9 was incorporated into pMON893 to create 
pMON10505. The synthetic/modified full length HD-73 
gene of FIG. 10 was incorporated into pMON893 to create 
pMON10526. The fully synthetic HD-73 gene of FIG. 11 
was incorporated into pMON893 to create pMON10518. 
These vectors were used to obtain transformed tobacco 
plants, and the plants were analyzed for insecticidal efficacy 
and for B.tk. HD-73 protein levels by Western blot or 
ELISA immunoassay. 

Tobacco plants containing all three of these full length 
B.tk. genes produced detectable B.tk. protein and showed 
100% mortality of tobacco horaworm. This result is surpris- 
ing in liglit of previous reported attempts to express the full 
length B.tk. genes in transgenic plants. Vaeck et al. (1987) 
reported that a full length B.tk. berliner gene similar to our 
HD-1 gene could not be detectably expressed in tobacco. 
Barton et al. (1987) reported a similar result for another full 
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length gene from B.t.k. HD-1 (the so called 4.5 kb gene), and 
further indicated that tobacco callus containing this gene 
became necrotic, indicating that the full length gene product 
was toxic to plant cells. FischhofF et al. (1987) reported that 
the full length B.t.k. HD-1 gene in tomato was poorly ^ 
expressed compared to a truncated gene, and no plants that 
were fully toxic to tobacco homworm could be recovered. 
All three of the above reports indicated much higher expres- 
sion levels and recovery of toxic plants if the respective jq 
B.t.k. genes were truncated. Adang et al. reported that the 
full length HD-73 gene yielded a few tobacco plants with 
some biological activity (none were highly toxic) against 
homworm and barely detectable B.t.k. protein. It was also 
noted by them that the major B.tk. mRNA in these plants 
was a truncated 1.7 kb species that would not encode a 
functional toxin. This indicated improper expression of the 
gene in tobacco. In contrast to all of these reports, the three 
full length B.Lk HD-73 genes described above all lead to 20 
relatively high levels of protein and high levels of insect 
toxicity. 

B.t.k. protein and mRNA levels in tobacco plants are 
shown in Table X for these three vectors. As can be seen 
from the table, the synthetic/wild-type gene (pMON 10506) ^5 
produces B.Lk. protein as about 0.01% of total soluble 
protein; the synthetic/modified gene produces B.tk. as about 
0.02% of total soluble protein; and the fiiUy synthetic gene 
produces B.tk. as about 0.2% of total soluble protein. B.Lk. 
mRNA was analyzed in these plants by Northern blot 
analysis using the common 5* synthetic half of the genes as 
a probe. As shown in Table X, the increased protein levels 
can largely be attributed to increased mRNA levels. Com- 
pared to the truncated modified and synthetic genes, this 35 
could indicate that the major contributors to increased 
translational efficiency are in the 5* half of the gene while the 
3' half of the gene contains mosdy determinants of mRNA 
stability. The increased protein levels also indicate that ^ 
increasing the amount of the full length gene that is synthetic 
or modified increases B.tk. protein levels. Compared to the 
tmncated synthetic B.tk. HD-73 genes (pMON5383 or 
pMON5390), the fully synthetic gene (pMON10518) pro- 
duces as much or slightly more protein demonstrating that 45 
the full length genes are capable of being expressed at high 
levels in plants. These tobacco plants with high levels of full 
length HD-73 protein show no evidence of abnormality and 
are fully fertile. The B.t.k. protein levels in these plants also 
produce the expected levels of insect toxicity based on 50 
feeding studies with beet armyworm or diet incorporation 
assays of plant extracts with tobacco budworm. Tlie B.tk 
protein detected by Westem blot analysis in these tobacco 
plants often contains a varying amount of protein of about 80 
kDa which is apparenUy a proteolytic fragment of the fiill 
length protein- The C-terminal half of the full length protein 
is known to be proteolytically sensitive, and similar pro- 
teolytic fragments are seen from the full length gene in E, 
coll and B.t itself. These fragments are fiiUy insecticidal. 
TTie Northern analysis indicated that essentially all of the 
mRNA from these full length genes was of the expected full 
length size. There is no evidence of truncated mRNAs that 
could give rise to the 80 kDa protein fragment In addition, 
it is possible that the fragment is not present in intact plant 65 
cells and is merely due to proteolysis during extraction for 
immunoassay. 
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TABLE X 



FuU Length B.tk. HDr73 Protein and 
mRNA Levels in Transgeiiic Tobacco Plants 

Relative 

Gene B.Lk- protein B-Lk. 

description Vector concentration mRNA level 



Synthetic/wild type pMON10506 >100 OS 

SyniheticAnodified pMON10526 400 1 

Fully synthetic pMON105l8 >2000 40 



Thus, there is no serious impediment to producing high 
levels of B.tk. HD-73 protein in plants from synthetic 
genes, and this is expected to be true of other full length 
lepidopteran active genes such as B.tk. HD-1 or B. t ento- 
mocidus. The fully synthetic B.tk. HD-1 gene of Example 
3 has been assembled in plant transformation vectors such as 
pMON893. 

The fully synthetic gene in pMON1051 8 was also utilized 
in another plant vector and analyzed in tobacco plants. 
Although the CaMV35S promoter is generally a high level 
constitutive promoter in most plant tissues, the expression 
level of genes driven the CaMV35S promoter is low in floral 
tissue relative to the levels seen in leaf tissue. Because the 
economically important targets damaged by some insects are 
the floral parts or derived from floral parts (e.g., cotton 
squares and bolls, tobacco buds, tomato buds and firuit), it 
may be advantageous to increase the expression of B.t 
protein in these tissues over that obtained with the 
CaMV35S promoter. 

The 35S promoter of Figwort Mosaic Virus (FMV) is 
analogous to the CaMV35S promoter. This promoter has 
been isolated and engineered into a plant transformation 
vector analogous to pMON893. Relative to the CaMV 
promoter, the FMV 35S promoter is highly expressed in the 
floral tissue, while still providing simOar high levels of gene 
expression in other tissues such as leaf. A plant transforma- 
tion vector, pMON10517, was constructed in which the fiill 
length synthetic B.tk, HD-73 gene of FIG. 11 was driven by 
the FMV 35S promoter. This vector is identical to 
pMON10518 of Example 3 except that the FMV promoter 
is substimted for the CaMV promoter. Tobacco plants trans- 
formed with pMON10517 and pMON10518 were obtained 
and compared for expression of the B.tk. protein by Westem 
blot or ELISA immunoassay in leaf and floral tissue. This 
analysis showed that pMON10517 containing the FMV 
promoter expressed the full length HD-73 protein at higher 
levels in floral tissue than pMON10518 containing the 
CaMV promoter. Expression of the full length B.t.k. HD-73 
protein from pMON10517 in leaf tissue is comparable to 
that seen with the most highly expressing plants containing 
pMON10518. However, when floral tissue was analyzed, 
tobacco plants containing pMON10518 that had high levels 
of B.tk. protein in leaf tissue did not have detectable B.tk. 
protein in the flowers. On the other hand, flowers of tobacco 
plants containing pMON10517 had levels of B.tJc protein 
nearly as high as the levels in leaves at approximately 0.05% 
of total soluble protein. This analysis showed that the FMV 
promoter could be used to produce relatively high levels of 
B.tk. protein in floral tissue compared to the CaMV pro- 
moter, 
b) Tomato. 

The wild-type, modified and synthetic B.tk. HD-1 genes 
tested in tobacco were introduced into other plants to 
demonstrate the broad utility of this invention. Transgenic 
tomatoes were produced which contain these three genes. 
Data show that the increased expression observed with the 
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modified and synthetic gene in tobacco also extends to 
tomato. Whereas the B.t.Ic HD-1 protein is only barely 
detectable in plants containing the wild type HD-1 gene 
(pMON9921), B.t.k. HD-1 was readily detected and the 
levels determined for plants containing the modified 
(pMON5370) or synthetic (pMON5377) genes. Expression 
levels for the plants containing the wild-type, modified and 
synthetic HD-1 genes were approximately 10, 100 and 500 
ng per mg of total plant extract see Table XI below). Hie 
increase in B.t.k. HD-1 protein for the modified gene 
accounted for the majority of increase observed; 10 fold 
higher than the plants containing the wild-type gene, com- 
pared to only an additional five-fold increase for plants 
containing the synthetic gene. Again the site-directed 
changes made in the modified gene are the major contribu- 
tors to the increased expression of B.t-k. HD-1. 

TABLE XI 



B.Lk. HD-I Expression in 
Transgenic Tomato Plant 



Gene 

Pesciiption 



\fector 



B.Lk. Protein* 
Concentration 



Fold Increase 
in B.Lk. 
Expression 



VWldtype 
Modified 
Synthetic 



pMON9921 
pMON5370 
pMON5377 



10 
100 
500 



1 
10 
50 



Protection of Tomato Plants from 
Ibbacco Homwonn and Beet Armywonn 


Gene 

Description 


Vector 


Tobacco Homwonn Beet Aimyworm 
Damage* Damage* 


None 


None 


NL NL 


Wild type 


pMON9921 


0 3 


Modified 


pMON5370 


0 1 


Synthetic 


pMON5377 


0 0 



10 



15 



20 



25 



*B.Lk. HD-1 protein concentrations are expressed in ngAng of total soluble 
plant protein. Data for plants containing the wild-type gene are estimates fiom 
mRNA levels and protein levels determined by EUSA. 

These difiFerences in B.Lk. HD-1 expression were con- 
firmed with bioassays against tobacco homworm and beet 
armyworm. Leaves from tomato plants containing each of 
these genes controlled tobacco homwonn damage and pro- 
. duced 100% mortality. V/ith beet armyworm, leaves from 
plants containing the wild-type HD-1 gene (pMON9921) 
showed significant damage, leaves torn plants containing 
the modified gene (pMON5370) showed less damage and 
leaves from plants containing the synthetic gene 
(pMON5377) were completely protected (see Table XII 
below). 

TABLE Xn 



30 



35 



40 



45 



50 



^Damage was rated as shown in Table IX. 

The generality of the synthetic gene approach was 
extended in tomato with a synthetic B.Lk. HD-73 gene. 55 

In tomato, extracts from plants containing the wild-type 
truncated HD-73 gene (pMON5367) showed no detectable 
HD-73 protein. Extracts from plants containing the synthetic 
HD-73 gene (pMON5383) showed high levels of B.Lk. 
HD-73 protein, approximately 2000 ng per mg of plant 60 
extract protein. These data clearly demonstrate that the 
changes made in the synthetic HD-73 gene lead to dramatic 
increases in the expression of the HD-73 protein in tomato 
as well as in tobacco 

In contrast to tobacco, the synthetic HD-73 gene in tomato 65 
is expressed at approximately 4-fold to 5-fold higher levels 
than the synthetic HD-1 gene. Because the HD-73 protein is 
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about 5-fold more active than the HD-1 protein against 
many insect pests including Heliothis species, the increased 
expression of synthetic HD-73 compared to synthetic HD-1 
corresponds to about a 25-fold increased insecticidal effi- 
cacy in tomato. 

In order to determine the mechanisms involved in the 
increased expression of modified and synthetic B.Lk. HD-1 
genes in tomato, SI nuclease analysis of mRNA levels from 
transformed tomato plants was performed. As indicated 
above, a similar analysis had been performed with tobacco 
plants, and this analysis showed that the modified gene 
produced up to 10-fold more mRNA than the wild-type 
gene. The analysis in tomato utilized a different DNA probe 
that allowed the analysis of wild-type (pMON9921), modi- 
fied (pMON5370) and synthetic (pMON5377) HD-1 genes 
■with the same probe. This probe was derived firom the 5* 
untranslated region of the CaMV35S promoter in pMON893 
that was common to all three of these vectors (pMON9921, 
pMON5370 and pMON5377). This SI analysis indicated 
that B.Lk. mRNA levels from the modified gene were 3 to 
5 fold higher than for the wild-type gene, and that mRNA 
levels for the synthetic gene were about 2 to 3 fold higher 
than for the modified gene. Three independent transformants 
were analyzed for each gene. Compared to the fold increases 
in B.t.k. HD-1 protein from these genes in tomato shown in 
Table XI, these mRNA increases can explain about half of 
the total protein increase as was seen in tobacco for the 
wOd-type and modified genes. For tomato the total mRNA 
increase from wild-type to synthetic is about 6 to 15 fold 
compared to a protein increase of about 50 fold. This result 
is similar to that seen for tobacco in comparing the wild-type 
and modified genes, and it extends to the synthetic gene as 
well. That is, about half of the total fold increase in B.Lk. 
protein from wild-type to modified genes can be explained 
by mRNA increases and about half to enhanced translational 
efficiency. The same is also true in comparing the modified 
gene to the synthetic gene. Although there is an additional 
increase in RNA levels, this mRNA increase can explain 
only about half of the total protein increase. 

The full length B.Lk. genes described above were also 
used to transform tomato plants and these plants were 
analyzed for B.Lk. protein and insecticidal efficacy. The 
results of this analysis are shown in T^ble XHI. Plants 
containing the synthetic/wild-type gene (pMON10506) pro- 
duce the B.Lk. HD-73 protein at levels of about 0.01% of 
their total soluble protein. Plants containing the synthetic/ 
modified gene (pMON10526) iiroduce about 0,04% B.Lk. 
protein, and plants containing the fully synthetic gene 
(pMON10518) produce about 0.2% B.Lk. protein. These 
results are very similar to the tobacco plant results for the 
same genes. mRNA levels estimated by Northern blot analy- 
sis in tomato also increase in parallel with the protein, level 
increase. As for tobacco with these three genes, most of the 
protein increase can be attributed to increased mRNA with 
a small component of translational efficiency increase indi- 
cated for the fully synthetic gene. The highest levels of full 
length B.Lk. protein (from pMON10518) are comparable to 
or just slightiy lower than the highest levels observed for the 
truncated HD-73 genes (pMON5383 and pMON5390). 
Tomato plants expressing these fiill length genes have the 
insecticidal activity expected for the observed protein levels 
as determined by feeding assays with beet armyworm or by 
diet incorporation of plant extracts with tobacco homworm. 
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TABLE Xm 



Full Length B.Lk. HD-73 Protein and 
mRNA Levels in Transgenic Tomato Plants 



Gene 

description 



Vector 



B.t.k. protein 
conuntrati on 



Relative 
B.Uc 
mRNA level 



Synthetic/wild type 
SyntheticAnodified 
Fully synthcdc 



pMON10506 
pMON10526 
pMONlOSlS 



100 
400 
2000 



1 

2-4 
10 



c) Cotton. 

The generality of the increased expression of B.tk. HD-1 
and B.t.k. HD-73 by use of the modified and synthetic genes 
was extended to cotton. Transgenic calli were produced 
which contain the wild type (pMON9921) and the synthetic 
HD-1 (pMON5377) genes. Here again the B.Lk. HD-1 
protein produced from calli containing the wild-type gene 
was not detected, whereas calli containing the synthetic 
HD-1 gene expressed the HD-1 protein at easily detectable 
levels. The HD-1 protein was produced at approximately 
1000 ng/mg of plant calli extract protein. Again, to ensure 
that the protein produced by the transgenic cotton calli was 
biologically active and that the increased expression 
observed with the synthetic gene translated to increased 
biological activity, extracts of cotton calli were made in 
similar manner as described for tobacco plants, except that 
the calli was first dried between Whatman filter paper to 
remove as much of the water as possible. The dried calli 
were then ground in liquid nitrogen and ground in 100 mM 
sodium carbonate buffer, pH 10. Approximately 0.5 ml 
aliquotes of this material was applied to tomato leaves with 
a paint brush. After the leaf dried, five tobacco homworm 
larvae were applied to each of two leaf samples. Leaves 
painted with extract from control calli were completely 
destroyed. Leaves painted with extract from calli containing 
the wild-type HD-1 gene (pMON9921) showed severe dam- 
age. Leaves painted with extract from calli containing the 
synthetic HD-1 gene (pMON5377) showed no damage (see 
Tkble XrV below). 
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TABLE XIV 



Protection against Tobacco Homworan by Tonaato Leaves 

Painted with Extracts Prepared from Cotton Calli 
Containing a Control, the ^d-l^pe B.Lk. HD-l Gene, 
Synthetic HD-1 Gene or Synthetic HE)-73 Gene 



Gene 

I^scriiMion 



Tobacco Homworni 



Vfcctor 



Control 

Wild type HD-1 
Synthetic HD-1 
Synifaedc HD-73 



Control 
pMON9921 
pMON5377 
pMON5383 



NL 
3 
0 
0 



45 



50 



♦Damage was rated as shown in Table Vm. 

Cotton calli were also produced containing another syn- 
thetic gene, a gene encoding B.Lk. HD-73. The preparation 
of this gene is described in Example 3. Calli containing the 
synthetic HD-73 gene produced the corresponding HD-73 
protein at even higher levels than the calli which contained 
the synthetic HD-1 gene. Extracts made from calli contain- 
ing the HD-73 synthetic gene (pMON5383) showed com- 
plete control of tobacco homworm when painted onto 
tomato leaves as described above for extracts containing the 
HD-1 protein. (See Table XIV). 

Transgenic cotton plants containing the synthetic 30 B.tk. 
HD-1 gene (pMON5377) or the synthetic B.Lk. HD-73 gene 
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(pMON5383) have also been examined- These plants pro- 
duce the HD-1 or HD-73 proteins at levels comparable to 
that seen in cotton callus with the same genes and compa- 
rable to tomato and tobacco plants with these genes. For 
either synthetic truncated HD-1 or HD-73 genes, cotton 
plants expressing B.Lk. protein at 1000 to 2000 ng/mg total 
protein (0.1% to 0.2%) were recovered at a high frequency. 
Insect feeding assays were performed with leaves firom 
cotton plants expressing the synthetic HD-1 or HD-73 
genes. These leaves showed no damage (rating of 0) when 
challenged with larvae of cabbage looper (Trichoplitsia ni"), 
and only slight damage when challenged with larvae of beet 
army worm (Spodoptera exigua). Damage ratings are as 
defined in Table Vm above. This demonstrated that cotton 
plants as well as calli expressed the synthetic HD-1 or 
HD-73 genes at high levels and that those plants were 
protected from damage by Lepidopteran insect larvae. 

Transgenic cotton plants containing either the synthetic 
truncated HD-1 gene (pMON5377) or the synthetic trun- 
cated HD-73 gene (pMON5383) were also assessed for 
protection against cotton bollworm at the whole plant level 
in the greenhouse. This is a more realistic test of the ability 
of these plants to produce an agriculturally acceptable level 
of control. The cotton bollworm {Hellothis zed) is a major 
pest of cotton that produces economic damage by destroying 
terminals, squares and bolls, and protection of these fruiting 
bodies as well as the leaf tissue will be important for 
effective insect control and adequate crop protection. To test 
the protection afforded to whole plants, Rl progeny of 
cotton plants expressing high levels of either B.Lk. HD-1 
(pMON5377) or B.t.k. HD-73 (pMON5383) were assayed 
by applying 10-15 eggs of cotton bollworm per boll or 
square to the 20 uppermost squares or bolls on each planL 
At least 12 plants were analyzed per treatmenL The hatch 
rate of the eggs was £q)proximately 70%, This corresponds 
to very high insect pressure compared to numbers of larvae 
per plant seen under typical field conditions. Under these 
conditions 1(X)% of the bolls on control cotton plants were 
destroyed by insect damage. For the transgenics, significant 
boll protection was observed. Plants containing pMON5377 
(HD-1) had 70-75% of the bolls survive the intense pressure 
of this assay. Plants containing pMON5383 (HD-73) had 
80% to 90% boll protection. This is likely to be a cousct 
quence of the higher activity of HD-73 protein against 
cotton bollwomi compared to HD-1 protein. In cases where 
the transgenic plants were damaged by the insects, the 
surviving larvae were delayed in their development by at 
least one instar. 

Therefore, the increased expression obtained with the 
modified and synthetic genes is not limited to any one crop; 
tobacco, tomato and cotton caUi and cotton plants all showed 
drastic increases in B.Lk. expression when the plants/calli 
were produced containing the modified or synthetic genes. 
Likewise, the utility of changes made to produce the modi- 
fied and synthetic B.Lk. HD-1 gene is not limited to the 
HD-1 gene. The synthetic HD-73 gene in all three species 
also showed drastic increases in expression. 

In summary, it has been demonstrated that: (1) the genetic 
changes made in the HD-1 modified gene lead to very 
significant increases in B.Lk. HD-1 expression; (2) produc- 
tion of a totally synthetic gene lead to a further five-fold 
increase in B.Lk. HD-1 expression; (3) the changes incor- 
porated into the modified HD-1 gene accounted for the 
majority of the increased B.Lk. expression observed with the 
synthetic gene; (4) the increased expression was demon- 
strated in three dLSferent plants — ^tobacco plants, tomato 
plants and cotton calli and cotton plants; (5) the increased 
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expression as observed by Western analysis also correlated 
with similar increases in bioactivity. showing that the B.t.k. 
HD-1 proteins produced were comparably active; (6) when 
the method of the present invention used to design the 
synthetic HD-1 gene was employed to design a synthetic 5 
HD-73 gene it also was expressed at much higher levels in 
tobacco, tomato and cotton than the wild-type equivalent 
gene with consequent increases in bioactivity; (7) a fully 
synthetic full length B-tk. gene was expressed at levels 
comparable to synthetic truncated genes. lO 

Example 5 — Synthetic Bj, tenebrionis Gene in 
Tobacco, Tomato and Potato 

Referring to FIG. 12, a synthetic gene encoding a 15 
Coleopteran active toxin is prepared by making the indicated 
changes in the wild-type gene of Bj. tenebrionis or de novo 
synthesis of the synthetic structural gene. The synthetic gene 
is inserted into an intermediate plant transformation vector 
such as pMON893: Plasmid pMON893 containing the syn- 20 
thetic B.tt. gene is then inserted into a suitable disarmed 
Agrobacterium strain such as A. tumefaciens AGO. 



Transformation and Regeneration of Potato 



25 



Sterile shoot cultures of Russet Burbank are maintained in 
vials containing 10 ml of PM medium (Murashige and 
Skoog (MS) inorganic salts, 30 g/1 surcose, 0.17 gA 
NaH2P04H20, 0.4 mg/1 thiamine-HCl. and 100 mgA myo- 
inositol, solidified with 1 g/1 Gelrite at pH 6.0). When shoots 30 
reached approximately 5 cm in length, stem intcmode seg- 
ments of 7-10 nmi are excised and smeared at the cut ends 
with a disarmed Agrobacterium tumefaciens vector contain- 
ing the synthetic B.Lt gene finom a four day old plate culture. 
The stem explants are co-cultured for three days at 23° C. on 35 
a sterile filter paper placed over 1.5 ml of a tobacco cell 
feeder layer overlaid on 1/10 P medium (1/10 strength MS 
inorganic salts and organic addenda without casein as in 
Jarret et al. (1980), 30 g/1 surcose and 8.0 g/1 agar). Fol- 
lowing co-culture the explants are transferred to full strength 40 
P-1 medium for callus induction, composed of MS inorganic 
salts, organic additions as in Jarret et al. (1980) with the 
exception of casein, 3.0 mg/1 benzyladenine (BA), and 0.01 
mg/1 naphthaleheacetic add (NAA) (Jarret, et al., 1980). 
Carbenicillin (500 mg/1) is included to inhibit bacterial 45 
growth, and 100 mg/1 kanamycin is added to select for 
transformed cells. After four weeks the explants are trans- 
ferred to medium of the same composition but with 0.3 mg/1 
gibberellic acid (GA3) replacing the B A and NAA (Jarret et 
al., 1981) to promote shoot formation. Shoots begin to 50 
develop approximately two weeks after transfer to shoot 
induction medium; these are excised and transferred to vials 
of PM medium for rooting. Shoots are tested for kanamycin 
resistance conferred by the enzyme neomycin phosphotrans- 
ferase n, by placing a section of the stem onto callus 55 
induction medium containing MS organic and inorganic 
salts, 30 g/1 surcrose, 2.25 mg/1 BA. 0.186 mg/1 NAA, 10 
mg/1 GA3 (Webb, et al., 1983) and 200 mg/1 kanamycin to 
select for transformed cells. 

The synthetic B.tt. gene described in FIG. 12, was placed 60 
into a plant expression vector as desdbed in example 5. The 
plasmid has the following characteristics; a synthetic BglQ 
fragment having approximately 1 800 base pairs was inserted 
into pMON893 in such a marmer that the enhanced 35S 
promoter would express the B.tt. gene. This construct, 65 
pMON1982, was used to transform both tobacco and 
tomato. Ibbacco plants, selected as kanamycin resistant 



plants were screened with rabbit anti-B.t,t. antibody. Cross- 
reactive material was detected at levels predicted to be 
suitable to cause mortality to CPB. These target insects will 
not feed on tobacco, but the transgenic tobacco plants do 
demonstrate that the synthetic gene does improve expression 
of this protein to detectable levels. 

Tomato plants with the pMON1982 construct were deter- 
mined to produce B.LL protein at levels insecticidal to CPB. 
In initial studies, the leaves of four plants (5190. 5225, 5328 
and 5133) showed little or no damage when exposed to CPB 
larvae (damage rating of 0-1 on a scale of 0 to 4 with 4 as 
no leaf remaining). Under these conditions the control 
leaves were completely eaten. Immunological analysis of 
these plants confirmed the presence of material cross-reac- 
tive with anti-B.tt. antibody. Levels of protein expression in 
these plants were estimated at aproximately 1 to 5 ng of B.tt. 
protein in 50 ug of total extractable protein. A total of 17 
tomato plants (17 of 65 tested) have been identified which 
demonstrate protection of leaf tissue from CPB (rating of 0 
or 1) and show good insect mortality. 

Results similar to those seen in tobacco and tomato with 
pMON1982 were seen with pMON1984 in the same plant 
species. pMON1984 is identical to pMON1982 except that 
the synthetic protease inhibitor (CMTI) is fused upstream of 
the native proteolytic cleavage site. Levels of expression in 
tobacco were estimated to be similar to pMON1982, 
between 10-15 ng per 50ug of total soluble protein. 

Tomato plants expressing pMON1984 have been identi- 
fied which protect the leaves from ingestion by CPB. The 
damage rating was 0 with 100% insect mortality. 

Potato was transformed as described in example 5 with a 
vector similar to pMON1982 containing the enhanced 
CaMV35S/synthetic B.tt gene. Leaves of potato plants 
transfoimed with this vector, were screened by CPB insect 
bioassay. Of the 35 plants tested, leaves from 4 plants, 16a, 
13c. 13d, and 23a were totally protected when challenged. 
Insect bioassays with leaves from three other plants, 13e, la, 
and 13b, recorded damage levels of 1 on a scale of 0 to 4 
with 4 being total devestation of the leaf material. Immu- 
nological analysis confirmed the presence of B.tt cross- 
reactive material in the leaf tissue. Tlie level of B.tt protein 
in leaf tissue of plant 16a (damage rating of 0) was estimated 
at 20-50 ng of B.t.t pTOtein/50 ug of total soluble protein. 
The levels of B.tt protein seen in 16a tissue was consistent 
with its biological activity. Inmiunological analysis of 13e 
and 13b (tissue which scored 1 in damage rating) reveal less 
protein (5-10 ng/50 ug of total soluble protein) than in plant 
16a. Cuttings of plant 16a were challenged with 50 to 200 
eggs of CPB in a whole plant assay. Under these conditions 
16a showed no damage and 100% mortality of insects while 
control potato plants were heavily damaged. 

Example 6 — Synthetic. B.tk. P2 Protein Gene 

The P2 protein is a distinct insecticidal protein produced 
by some strains of B.t including B.tk. HD-1. It is charac- 
terized by its activity against both lepidopteran and dipteran 
insects (Yamamoto and lizuka, 1983). Genes encoding the 
P2 protein have been isolated and characterized (Donovan et 
al., 1988). The P2 proteins encoded by these genes are 
approximately 600 amino acids in length. These proteins 
share only limited homology with the lepidopteran specific 
PI type proteins, such as die B.tk. HD-1 and HD-73 
proteins described in previous examples. 

The P2 proteins have substantial activity against a variety 
of lepidopteran larvae including cabbage looper, tobacco 
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homwonn and tobacco budwonn. Because they are active 
against agronomically important insect pests, the P2 proteins 
are a desirable candidate in the production of insect tolerant 
transgenic plants either alone or in combination with the 
other B.t. toxins described in the above examples. In some 5 
plants, expression of the P2 protein alone might be sufficient 
to provide protection against damaging insects. In addition, 
the P2 proteins might provide protection against agronomi- 
cally important dipteran pests. In other cases, expression of 
P2 together with the B.tk. HD-1 or HD-73 protein might be lo 
preferred. The P2 proteins should provide at least an additive 
level of insecticidal activity when combined with the crystal 
protein toxin of B.t.k. HD-1 or HD-73, and the combination 
may even provide a synergistic activity. Although the mode 
of action of the P2 protein is unknown, its distinct amino 15 
acid sequence suggests that it functions differently from the 
B.t.k. HD-1 and HD-73 type of proteins. Production of two 
insect tolerance proteins with different modes of action in 
the same plant would minimize the potential for develop- 
ment of insect resistance to B.t. proteins in plants. The lack 20 
of substantial DNA homology between P2 genes and the 
HD-1 and HD-73 genes minimizes the potential for recom- 
bination between multiple insect tolerance genes in the plant 
chromosome. 

The genes encoding the P2 protein although distinct in 25 
sequence from the B.t.k. HD-1 and HD-73 genes share many 
common features with these genes. In particular, the P2 
protein genes have a high A+T content (65%), multiple 
potential polyadenylation signal sequences (26) and numer- 
ous ATTTA sequences (10). Because of its overall similarity 30 
to the poorly expressed wild-type B.t.k. HD-1 and HD-73 
genes, the same problems are expected in expression of the 
wild-type P2 gene as were encountered with the previous 
examples. Based on the above-described method for design- 
ing the sjoithetic B.t genes, a synthetic P2 gene has been 35 
designed which gene should be expressed at adequate levels 
for protection in plants. A comparision of the wOd-type and 
synthetic P2 genes is shown in FIG. 13. 



Example 7 — Synthetic B.t, Entomocidus Gene 
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The B.t entomocidus ("Btent") protein is a distinct insec- 
ticidal protein produced by some strains of B.t. bacteria. It 
is characterized by its high level of activity against some 
lepidopterans that are relatively insensitive to B.tk. HD-1 45 
and HD-73 such as Spodoptera species including beet aimy- 
wonn (Visser et al., 1988). Genes encoding the Btent protein 
have been isolated and characterized (Honee et al, 1988). 
Hie Brent proteins encoded by these genes are approxi- 
mately the same length as B.tk. HD-1 and HD-73. These 50 
proteins share only 68% amino acid homology with the 
B.tk. HD-1 and HD-73 proteins. It is likely that only the 
N-terminal half of the Btent protein is required for insecti- 
cidal activity as is the case for HD-1 and HD-73. Over the 
first 625 amino adds. Brent shares only 38% amino acid 55 
homology with HD-1 and HD-73. 

Because of their higher activity against Spodoptera spe- 
cies that are relatively insensitive to HD-1 and HD-73, the 
Btent proteins are a desirable candidate for the production of 
insect tolerant transgenic plants either alone or in combina- 60 
tion with the other B.t. toxins described in the above 
examples. In some plants production of Btent alone might be 
sufBcient to control the agronomicaUy important pests. In 
other plants, the production of two distinct insect tolerance 
proteins would provide protection against a wider array of 65 
insects. Against those insects where both proteins are active, 
the combination of the B.tk. HD-1 or HD-73 type protein 



plus the Btent protein should provide at least additive 
insecticidal efBcacy, and may even provide a synergistic 
activity. In addition, because of its distinct amino acid 
sequence, the Btent protein may have a different mode of 
action than HD-1 or HD-73. Production of two insecticidal 
proteins in the same plant with different modes of action 
would mininuze the potential for development of insect 
resistance to B.t proteins in plants. The relative lack of DNA 
sequence homology with the B.tk. type genes minimizes the 
potential for recombination between multiple insect toler- 
ance genes in the plant chromosome. 

The genes encoding the Btent protein although distinct in 
sequence from the B.tk. HD-1 and HD-73 genes share many 
cormnon features with these genes. In particular, the Brent 
protein genes have a high A+T content (62%), multiple 
potential polyadenylation signal sequences (39 in the full 
length coding sequence and 27 in the first 1875 nucleotides 
that is likely to encode the active toxic fragment) and 
numerous ATTTA sequences (16 in the full length coding 
sequence and 12 in the first 1875 nucleotides). Because of its 
overall similarity to the poorly expressed wild type B.tk. 
HD-1 and HD-73 genes, the wild-type Btent genes are 
expected to exhibit similar problems in expression as were 
encountered with the wild-type HD-1 and HD-73 genes. 
Based on the above-described method used for designing the 
other synthetic B.t genes, a synthetic Btent gene has been 
designed which gene should be expressed at adequate levels 
for protection in plants. A comparision of the wild type and 
synthetic Brent genes is shown in HG. 14. 

Example 8 — Synthetic B.tk. Genes for Expression 
in com 

High level expression of heterologous genes in com ceUs 
has been shown to be enhanced by the presence of a com 
gene intron (Callis et al., 1987). Typically these introns have 
been located in the 5' untranslated region of the chimeric 
gene. It has been shown that the CaMV35S promoter and the 
NOS 3' end function efficientiy in the expression of heter- 
ologous genes in com cells (Fromm et al., 1986). 

Referring to FIG. 15, a plant expression cassette vector 
(pMON744) was constructed that contains these sequences. 
Specifically the expression cassette contains the enhanced 
CaMV 35S promoter followed by intron 1 of the com Adhl 
gene (Callis et al., 1987). Hiis is followed by a multilinker 
cloning site for insertion of coding sequences; this multi- 
linker contains a BgUI site, among others. Following the 
multilinker is the NOS 3' end. pMON744 also contains the 
selectable marker gene 35S/NPTII/NOS 3' for kanamycin 
selection of transgenic com cells. In addition, pMON744 has 
an E. coli origin of replication and an ampicillin resistance 
gene for selection of the plasmid in E. coli. 

Five B.tk. coding sequences described in the previous 
examples were inserted into the Bgin site of pMON744 for 
com cell expression of B.tk. The coding sequences inserted 
and resulting vectors were: 

1. VWld type B.tk. HD-1 from pMON9921 to make 
pMON8652. 

2. Modified B.tk. HD-1 from pMON5370 to make 
pMON8642. 

3. Synthetic B.tk. HD-1 firom pMON5377 to make 
pMON8643. 

4. Synthetic B.tk. HD-73 firom pMON5390 to make 
pMON8644. 

5. Syntiietic fuU lengtii B.tk. HD-73 from pMON10518 
to make pMON 10902. 
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pMON8652 (wild-type B.Lk. HD-1) was used to trans- 
form com cell protoplasts and stably transformed kanamycin 
resistant callus was isolated. B.Lk. mRNA in the com cells 
was analyzed by nuclease SI protection and found to be 
present at a level comparable to that seen with the same 
wild-type coding sequence (pMON9921) in transgenic 
tomato plants. 

pMON8652 and pMON8642 (modified HD-1) were used 
to transform com cell protoplasts in a transient expression 
system. The level of B.t Jc. mRNA was analyzed by nuclease 
SI protection. The modified HD-1 gave rise to a several fold 
increase in B.t.k. mRNA compared to the wild-type coding 
sequence in the transiently transformed com cells. This 
indicated that the modifications introduced into the B.tk. 
HD-1 gene are capable of enhancing B.t.k- expression in 
monocot cells as was demonstrated for dicot plants and cells. 

pMON8642 (modified HD-1) and pMON8643 (synthetic 
HD-1) were used to transform Black Mexican Sweet (BMS) 
com cell protoplasts by PEG-mediated DNA uptake, and 
stably transformed com callus was selected by growth on 
kanamycin containing plant growth medium. Individual 
callus colonies that were derived from single transformed 
cells were isolated and propagated separately on kanamycin 
containing medium. 

To assess the expression of the B.tk. genes in these cells, 
callus samples were tested for insect toxicity by bioassay 
against tobacco homworm larvae. For each vector, 96 callus 
lines were tested by bioassay. Portions of each callus were 
placed on sterile water agar plates, and five neonate tobacco 
homworm' larvae were added and allowed to feed for 4 days. 
ForpMON8643, 100% of the larvae died after feeding on 15 
of the 96 calli and these calli showed little feeding damage. 
For pMON8642, only 1 of the 96 calli was toxic to the 
larvae. This showed that the B.Lk. gene was being expressed 
in these samples at insecticidal levels. The observation that 
significandy more calli containing pMON8643 were toxic 
than for pMON8642 showed that significantly higher levels 
of expression were obtained when the synthetic HD-1 cod- 
ing sequence was contained in com cells than when the 
modified HD-1 coding sequence was used, similar to the 
previous examples with dicot plants. A semiquantitative 
inamunoassay showed that the pMON8643 toxic samples 
had significantly higher B.tJc. protein levels than the 
pMON8642 toxic sample. 

The 16 callus samples that were toxic to tobacco hom- 
worm were also tested for activity against European com 
borer. European com borer is approximately 40-fold less 
sensitive to the HD-1 gene product than is tobacco hom- 
worm. Larvae of European com borer were applied to the 
callus samples and allowed to feed for 4 days. T^o of the 1 6 
calli tested, both of which containied pMON8643 (synthetic 
HD-1, were toxic to European com borer larvae. 

To assess the expression of the B.Lk. genes in differen- 
tiated com tissue, another method of DNA delivery was 
used. Young leaves were excised from com plants, and DNA 
samples were delivered into the leaf tissue by microprojec- 
tile bombardment. In this system, the DNA on the micro- 
projectiles is transiently expressed in the leaf cells after 
bombardmenL Three DNA samples were used, and each 
DNA was tested in triplicate. 

1. pMON744, the com expression vector with no B.Lk. 
gene. 

2. pMON8643 (synthetic HD-1). 

3. pMON752, a com expression vector for the GUS gene, 
no B.Lk. gene. 

Hie leaves were incubated at room temperature for 24 
hours. The pMON752 samples were stained with a substrate 
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that allows visual detection of the GUS gene producL This 
analysis showed that over one hundred spots in each sample 
were expressing the GUS product and the the triplicate 
samples showed very similar levels of GUS expression. For 
the pMON744 and pMON8643 samples 5 larvae of tobacco 
homworm were added to each leaf and allowed to feed for 
48 hours. All three samples bombarded with pMON744 
showed extensive feeding damage and no larval mortality. 
All three samples bombarded with pMON8643 showed no 
evidence of feeding damage and 100% larval mortality. The 
samples were also assayed for the presence of B.t.k. protein 
by a qualitative immunoassay. All of the pMON8643 
samples had detectable B.Lk. protein. These results demon- 
strated that the the synthetic B.Lk. gene was expressed in 
differentiated com plant tissue at insecticidal levels. 



Example 9 — Synthetic Potato Leaf Roll Wuus Coat 
Protein Gene 

Expression in plants of the coat protein genes from a 
variety of plant vimses has proven to be an effective method 
of engineering resistance to these vimses. In order to achieve 
vims resistance, it is important to express the viral coat 
protein at an effective level. For many plant vims coat 
protein genes, this has not proved to be a problem. However, 
for the coat protein gene from potato leaf roll vims (PLRV), 
expression of the coat protein has been observed to be low 
relative to other coat protein genes, and this lower level of 
protein has not led to optimal resistance to PLRV. 

The gene for PLRV coat protein is shown in FIG. 16. 
Referring to FIG. 16, the upper line of sequence shows the 
gene as it was originally engineered for plant expression in 
vector pMON893. The gene was contained on a 749 nucle- 
otide Bgin-EcoRI fragment with the ceding sequence con- 
tained between nucleotides 20 and 643. This fragment also 
contained 19 nucleotides of 5* noncoding sequence and 104 
nucleotides of 3' noncoding sequence. This PLRV coat 
protein gene was relatively poorly expressed in plants com- 
pared to other viral coat protein genes. 

A synthetic gene was designed to improve plant expres- 
sion of the PLRV coat protein. Referring again to FIG. 16, 
the changes made in the synthetic PLRV gene are shown in 
the lower line. Hiis gene was designed to encode exactly the 
same protein as the naturally occurring gene. Note that the . 
beginning of the synthetic gene is at nucleotide 14 and the 
end of the sequence is at nucleotide 654. The coding 
sequence for the synthetic gene is from nucleotide 20. to 643 
of the figure. The changes indicated just upstream and 
downstream of these endpoints serve only to introduce 
convenient restriction sites just outside the coding sequence. 
Thus the size of the synthetic gene is 641 nucleotides which 
is smaller than the naturally occurring gene. The synthetic 
gene is smaller because substantially all of the noncoding 
sequence at both the 5* and 3' ends, except for segments 
encoding the Bglll and EcoRI restriction sites has been 
removed. 

The synthetic gene differs from the naturally occurring 
gene in two main respects. First, 41 individual codons within 
the coding sequence have been changed to remove nearly aU 
codons for a given amino acid that constitute less than about 
15% of the codons for that amino acid in a survey of dicot 
plant genes. Second, the 5' and 3' noncoding sequences of 
the original gene have been removed. Although not strictly 
conforming to the algoritlim described in FIG. 1, a few of the 
codon changes and especially the removal of the long 3' 
noncoding region is consistent with this algorithm. 
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The original PLRV sequence contains two potential plant 
polyadenylation signals (AACCAA and AAGCAT) and both 
of the these occur in the 3* noncoding sequence that has been 
removed in the synthetic gene. The original PLRV gene also 
contains on ATTTA sequence. This is also contained in the 5 
3' noncoding sequence, and is in the midst of the longest 
stretch of uninterrupted A+T in the gene (a stretch of 7 A+T 
nucleotides). This sequence was removed in the synthetic 
gene. Thus, sequences that the algorithm of FIG. 1 targets 
for change have been changed in the synthetic PLRV coat 10 
protein gene by removal of the 3' noncoding segment, 
^thin the coding sequence, codon changes were also made 
to remove three other regions of sequence described above. 
In particular, two regions of 5 consecutive A+T and one 
region of 5 consecutive G+C within the coding sequence 15 
have been removed in the synthetic gene. 

The synthetic PLRV coat protein gene is cloned in a plant 
transfonnation vector such as pMON893 and used to trans- 
form potato plants as described above. These plants express 
. the PLRV coat protein at higher levels than achieved vnth 20 
the naturally occurring gene, and these plants exhibit 
increased resistance to infection by PLRV. 

Example 10 — ^Expression of Synthetic B.t. Genes 
with RUBISCO Small Subunit Promoters and 25 
Chloroplast Transit Peptides 

The genes in plants encoding the small subunit of 
RUBISCO (SSU) are often highly expressed, light regulated 
and sometimes show tissue specificity. These expression 
properties are laigely due to the promoter sequences of these 
genes. It has been possible to use SSU promoters to express 
heterologous genes in transformed plants. Typically a plant 
will contain multiple SSU genes, and the expression levels 
and tissue specificity of different SSU genes will be differ- ^5 
ent. The SSU proteins are encoded in the nucleus and 
synthesized in the cytoplasm as precursors that contain an 
N-terminal extension known as the chloroplast transit pep- 
tide (OTP). The CTP directs the precursor to the chloroplast 
and promotes the uptake of the SSU protein into the chlo- ^ 
roplast In this process, the CTP is cleaved from the SSU 
protein. These CTP sequences have been used to direct 
heterologous proteins into chloroplasts of transformed 
plants. 

The SSU promoters might have several advantages for 45 
expression of B.t.k. genes in plants. Some SSU promoters 
are very highly expressed and could give rise to expression 
levels as high or higher than those observed with the 
CaMV35S promoter- The tissue distribution of expression 
from SSU promoters is different fitam that of the CaMV35S 50 
promoter, so for control of some insect pests, it may be 
advantageous to direct the expression of B.Lk. to those cells 
in which SSU is most highly expressed. For example, 
although relatively constimtive, in the leaf the CaMV35S 
promoter is more highly expressed in vascular tissue than in 55 
some other parts of the leaf, while most SSU promoters are 
most highly expressed in the mesophyll cells of the leaf. 
Some SSU promoters also are more highly tissue specific, so 
it could be possible to utilize a specific SSU promoter to 
express B.tk. in only a subset of plant tissues, if for example go 
B.t. expression in certain cells was found to be deleterious 
to those cells. For example, for control of Colorado potato 
beetle in potato, it may be advantageous to use SSU pro- 
moters to direct B.t.t. expression to the leaves but not to the 
edible tubers. ^3 

Utilizing SSU CTP sequences to localize B.t. proteins to 
the chloroplast might also be advantageous. Localization of 
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the B.L to the chloroplast could protect the protein from 
proteases found in the cytoplasm. This could stabilize the 
protein and lead to higher levels of acctunulation of active 
protein. B.L genes containing the CTP could be used in 
combination with the SSU promoter or with other promoters 
such as CaMV35S. 

A variety of plant transformation vectors were constructed 
for the e xpres sion of B.t.k. genes utilizing SSU promoters 
and SSU CTPs. The promoters and CTPs utilized were from 
the petunia SSUlla gene described by Ttimer et al. (1986) 
and from the Arabidopsis atslA gene (an SSU gene) 
described by Krebbers et al. (1988) and by Elionor et al. 
(1989). The petunia SSUlla promoter was contained on a 
DNA fragment that extended approximately 800 bp 
upstream of the SSU coding sequence. The Arabidopsis 
atslA promoter was contained on a DNA fragment that 
extended approximately 1.8 kb upstream of the SSU coding 
sequence. At the upstream end convenient sites from the 
multilinker of pUC18 were used to move these promoters 
into plant transformation vectors such as pMON893. Hiese 
promoter fragments extended to the start of the SSU coding 
sequence at which point an Ncol restriction site was engi- 
neered to allow insertion of the B.t. coding sequence, 
replacing the SSU coding sequence. 

When SSU promoters were used in combination with 
their CTP, the DNA fragments extended through the coding 
sequence of the CTP and a small portion of the mature SSU 
coding sequence at which point an Ncol restriction site was 
engineered by standard techniques to allow the in frame 
fusion of B.t. coding sequences with the CTP. In particular, 
for the petunia SSUlla CTP, B.t coding sequences were 
frised to the SSU sequence after amino acid 8 of the mature 
SSU sequence at which point the Ncol site was placed. The 
8 amino acids of mature SSU sequence were included 
because preliminary in vitro chloroplast uptake experiments 
indicated that uptake was of B.tk. was observed only if this 
segment of mature SSU was included. For the Arabidopsis 
atslA CTP, the complete CTP was included plus 24 amino 
acids of mature SSU sequence plus the sequence gly-gly- 
arg-val-asn-cys-met-gln-ala-met, terminating in an Ncol site 
for B.t fusion. This short sequence reiterates the native SSU 
CTP cleavage site (between the cys and met) plus a short 
segment surrounding the cleavage site. This sequence was 
included in order to insure proper uptake into chloroplasts. 
B.t coding sequences were fused to this atsl A CTP after the 
met codorL In vitro uptake experiments with this CTP 
construction and other (non-B.t) coding sequences showed 
that this CTP did target proteins to the chloroplast. 

When CTPs were used in combination with the CaMV 
35 S promoter, the same CTP segments were used. They 
were excised just upstream of the ATG start sites of the CTP 
by engineering of BgllE sites, and placed downstream of the 
CaMV35S promoter in pMON893, as BgUI to Ncol frag- 
ments. B.t coding sequences were fused as described above. 

The wild type B.tk. HD-1 coding sequence of 
pMON9921 (see FIG. 1) was fused to the atsl A promoter to 
make pMON1925 or the atsl A promoter plus CTP to make 
pMON192L These vectors were used to transform tobacco 
plants, and the plants were screened for activity against 
tobacco homworm. No toxic plants were recovered. This is 
surprising in light of the fact that toxic plants could be 
recovered, albeit at a low frequency, after transformation 
with pMON9921 in which the B.tk. coding sequence was 
expressed from the enhanced CaMV35S, promoter in 
pMON893, and in light of the fact that Elionor et al. (1989) 
report that the atsl A promoter itself is comparable in 
strength to the CaMV35S promoter and approximately 
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10-fold stronger when the CTP sequence is included. At least 
for the wild-type B.t.lc HD-1 coding sequence, this does not 
appear to be the case. 

A variety of plant transformation vectors were constructed 
utilizing either the truncated synthetic . HD-73 coding 5 
sequence of FIG. 4 or the full length B,t.k. HD-73 coding 
sequence of FIG. 11. These are listed in the table below. 

TABLE XV 
Gene Constructs with CTPs 



B.Lk. HI>-73 



Vector 


Promoter 


CTP 


Coding Sequence 


pMON10806 


En35S 


atsla 


tmncated 


pMON10814 


En35S 


SSUlla 


full length 


pMONlOSll 


SSUlla 


SSUlla 


tmncaied 


pMON10819 


SSUlla 


none 


truncated 


pMONlOSlS 


atslA 


none 


truncated 


pMONlOSl? 


atslA 


atslA 


truncated 


pMONL0S21 


En 35S 


atslA 


truncated 


pMON10822 


En 35S 


atslA 


full length 


pMONL0838 


SSUlla 


SSUlla 


AiU length 


pMON10839 


alslA 


atslA 


fiill length 



All of the above vectors were used to transform tobacco 
plants. For all of the vectors containing truncated B.Lk. 25 
genes, leaf tissue from these plants has been analyzed for 
toxicity to insects and B.t.k. protein levels by immunoassay. 
pMON10806, 10811, 10819 and 10821 produce levels of 
B.tk. protein comparable to pMON5383 and pMON5390 
which contain synthetic B.t.k. HE>-73 coding sequences ^ 
driven by the En 35S promoter itself with no CTP. These 
plants also have the insecticidal activity expected for the 
protein levels detected. For pMON10815 and pMON10817 
(containing the atslA promoter), the level of B.tk. protein is 
about 5-fold higher than that found in plants containing 
pMON5383 or 5390. These plants also have higher insec- 
ticidal activity. Plants containing 10815 and 10817 contain 
up to 1% of their total soluble leaf protein as B.t.k. HD-73, 
Tliis is the highest level of B.t.k. protein yet obtained with 
any of the synthetic genes. 

This result is surprising in two respects. First, as noted ^ 
above, the wild type coding sequences fused to the atslA 
promoter and CTP did not show any evidence of higher 
levels of expression than for En 35S, and in fact had lower 
expression based on the absence of any insecticidal plants. 
Second, Eli onor et al. (1989) show that for two other genes, 45 
the atslA CTP can increase expression firom the atslA 
promoter by about 10-fold. For the synthetic B.t.k. HD-73 
gene , there is no consistent increase seen by including the 
CTP over and above that seen for the atsl A promoter alone. 

Tobacco plants containing the full length synthetic HD-73 50 
fused to the SSUllA CTP and driven by the En 35S 
promoter produced levels of B.Lk. protein and insecticidal 
activity comparable to pMON1518 which contains does not 
include the CTP. In addition, for pMON10518 the B.Lk. 
protein extracted from plants was observed by gel electro- 55 
phoresis to contain multiple forms less than fiill length, 
apparently due the cleavage of the C-terminal portion (not 
required for toxicity) in the cytoplasm. For pMON10814, 
the majority of the protein appeared to be intact full length 
indicating that the protein has been stabilized from proteoly- 60 
sis by targeting to the chloroplast 

Example 11 — TRirgeting of B.t. Proteins to the 
ExtraceUular Space or Vacuole through the Use of 
Signal Peptides 

The B.t proteins produced from the synthetic genes 
described here are localized to the cytoplasm of the plant 
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cell, and this cytoplasmic localization results in plants that 
are insecticidally effective. It may be advantageous for some 
purposes to direct the B.t. proteins to other compartments of 
the plant cell. Localizing B.t proteins in compartments other 
than the cytoplasm may result in less exposure of the 
proteins to cytoplasmic proteases leading to greater accu- 
mulation of the protein yielding enhanced insecticidal activ- 
ity. Extracellular localization could lead to more efficient 
exposure of certain insects to the B.t proteins leading to 
greater efficacy. If a B.t protein were foimd to be deleterious 
to plant ceU function, then localization to a noncytoplasmic 
compartment could protect these cells from the protein. 

In plants as well as other eucaryotes, proteins that are 
destined to be localized either extracellularly or in several 
specific compartments are typically synthesized with an 
N-terminal amino acid extension known as the signal pep- 
tide. This signal peptide directs the protein to enter the 
compartmentalization pathway, and it is typically cleaved 
from the mature protein as an early step in compartmental- 
ization. For an extracellular protein, the secretory pathway 
typically involves cotranslational insertion into the endo- 
plasmic reticulum with cleavage of the signal peptide occur- 
ing at this stage. The mature protein then passes thru the 
Golgi body into vesicles that fuse with the plasma mem- 
brane thus releasing the protein into the extracellular space. 
Proteins destined for other compartments follow a similar 
pathway. For example, proteins that are destined for the 
endoplasmic reticulimi or the Golgi. body follow this 
scheme, but they are specifically retained in the appropriate 
compartment In plants, some proteins are also targeted to 
the vacuole, another membrane botmd compartment in the 
cytoplasm of many plant cells. Vacuole targeted proteins 
diverge from the above pathway at the Golgi body where 
they enter vesicles that ftise with the vacuole. 

A common feature of this protein targeting is the signal 
peptide that initiates the compartmentalization process. Fus- 
ing a signal peptide to a protein will in many cases lead to 
the targeting of that protein to the endoplasmic reticulimi. 
The efficiency of this step may depend on the sequence of 
the mature protein itself as well. The signals that direct a 
protein to a specific compartment rather than to the extra- 
cellular space are not as clearly defined. It appears that many 
of the signals that direct the protein to specific compartments 
are contained within the amino acid sequence of the mature 
protein. This has been shown for some vacuole targeted 
proteins, but it is not yet possible to define these sequences 
precisely. It appears that secretion into the extracellular 
space is the * 'default'* pathway for a protein that contains a 
signal sequence but no other compartmentalization signals. 
Thus, a strategy to direct B.t proteins put of the cytoplasm 
is to fuse the genes for synthetic B.t genes to DNA 
sequences encoding known plant signal peptides. These 
fusion genes will give rise to B.t proteins that enter the 
secretory pathway, and lead to extracellualar secretion or 
targeting to the vacuole or other compartments. 

Signal sequences for several plant genes have been 
described. One such sequence is for the tobacco pathogen- 
esis related protein PR lb described by Comelissen et al. The 
PR lb protein is normally localized to the extracellular 
space. Another type of signal peptide is contained on seed 
storage proteins of legimies. These proteins are localized to 
the protein body of seeds, which is a vacuole like compart- 
ment found in seeds. A signal peptide DNA sequence for the 
beta' subimit of the 7S storage protein of common bean 
(Phaseolus vulgaris), PvuB has been described by Doyle et 
al. Based on the published these published sequences, genes 
were synthesized by chemical synthesis of oligonucleotides 



5,500,365 



43 



44 



that encoded the signal peptides for PR lb and PvuB. The 
synthetic genes for these signal peptides corresponded 
exactly to the reported DNA sequences. Just upstream of the 
translational intiation codon of each signal peptide a BamHI 
and Bgin site were inserted with the BamHI site at the 5' 5 
end. This allowed the insertion of the signal peptide encod- 
ing segments into the Bgin site of pMON893 for expression 
from the En 35S promoter. In some cases to achieve secre- 
tion or compartmentalization of heterologous proteins, it has 
proved necessary to include some amino acid sequence lo 
beyond the normal cleavage site of the signal peptide. This 
may be necessary to insure proper cleavage of the signal 
peptide. For PR lb the synthetic DNA sequence also 
included the first 10 amino acids of mature PR lb. For PvuB 
the synthetic DNA sequence included the first 13 amino is 
acids of mature PvuB. Both synthetic signal peptide encod- 
ing segments ended with Ncol sites to allow fusion in frame 
to the methionine initiation codon of the synthetic B.t genes. 

Four vectors encoding synthetic B.tk. HD-73 genes were 
constructed containing these signal peptides. The synthetic ^ 
truncated HD-73 gene from pMON5383 was fused with the 
signal peptide sequence of PvuB and incorporated into 
pMON893 to create pMON 10827. The synthetic truncated 
HD-73 gene from pMON5383 was also fiised with the signal 
peptide sequence of PR lb to create pMON10824. The full 25 
length synthetic HD-73 gene from pMON10518 was fused 
with the signal peptide sequence of PvuB and incorporated 
into pMON893 to create pMON10828. The full length 
synthetic HD-73 gene from pMON10518 was also fused 
with the signal peptide sequence of PRlb and incorporated 30 
into pMON893 to create pMON10825. 

These vectors were used to transform tobacco plants and 
the plants were assayed for expression of the B.Uc protein 
by Westem blot analysis and for insecticidal efficacy. 
pMON10824 and pMON10827 produced amounts of B.t.k, 
protein in leaf comparable to the truncated HD-73 vectors, 
pMON5383 and pMON5390. pMON10825 and 
pMON10828 produced full length B.tk. protein in amounts 
comparable to pMON10518. In all cases, the plants were 
insecticidally active against tobacco homworm. 
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We claim: 

1. A modified chimeric gene comprising a promoter which 
functions in plant cells operably linked to a structural coding 
sequence and a 3' non-translated region comprising a poly- 
adenylation signal which functions in plants to cause the 
addition of polyadenylate nucleotides to the 3* end of the 
RNA, wherein said structural coding sequence encodes a 
toxin protein derived from a Bacillus thuringiensis protein, 20 
wherein said structural coding sequence comprises a DNA 
sequence which differs from the naturally occurring DNA 
sequence encoding said Bacillus thuringiensis protein and 
comprises the following characteristics: 

said naturally occurring DNA sequence comprises a 25 
region having the following sequence; 

TTAATTAACCAAAGAATAGAAGAATTCGCTAGGAAC 
1 5 10 15 20 25 30 35 

30 

and said structural coding sequence has been modified, 
said modifications comprising at least one modification 
in said region selected from the group consisting of: 
nucleotide 1 is a cytosine (C); 

nucleotide 3 is a cytosine (C); 35 

nucleotide 6 is a cytosine (C); 

nucleotide 12 is a guanine (G); 

nucleotide 18 is a cytosine (C); 

nucleotide 24 is a guanine (G); and 

nucleotide 36 is a thymine (T). 40 

2. The modified chimeric gene of claim 1 wherein said 
modifications increase the number of plant preferred codons 
in said structural coding sequence. 

3. The modified chimeric gene of claim 1 wherein said 
Bacillus thuringiensis is Bacillus thuringiensis var. kurstakt 45 

4. A modified chimeric gene comprising a promoter which 
functions in plant cells operably linked to a structural coding 
sequence and a 3' non-translated region comprising a poly- 
adenylation signal which functions in plants to cause the 
addition of polyadenylate nucleotides to the 3* end of the so 
RNA. wherein said structural coding sequence encodes a 
toxin protein derived from a Bacillus thuringiensis protein, 
wherein said stmctural coding sequence comprises a DNA 
sequence which differs firom the naturally occurring DNA 
sequence encoding said Bacillus thuringiensis protein and 55 
comprises the following characteristics: 

said naturally occurring DNA sequence comprises a 
region having the following sequence: 

TTAATTAACTTVAAGAATAGAAGAATTCGCTAGGAAC go 
1 5 10 15 20 25 30 35 

and where said structural coding sequence comprises 
modifications so that at least said region contains at 
least one fewer sequence selected from the group 65 
consisting of plant polyadenylation sequences and an 
ATTTA sequence. 
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5. The modified chimeric gene of claim 4 wherein said 
modifications increase the number of plant preferred codons 
in said stmctural coding sequence. 

6. The modified chimeric gene of claim 4 wherein said 
Bacillus thuringiensis is Bacillus thuringiensis var. kurstaki 

7. A modified chimeric gene comprising a promoter which 
functions in plant cells operably linked to a structural coding 
sequence and a 3' non-translated region comprising a poly- 
adenylation signal which functions in plants to cause the 
addition of polyadenylate nucleotides to the 3' end of the 
RNA, wherein said stmctural coding sequence encodes a 
toxin protein derived from a Bacillus thuringienis protein, 
wherein said structural coding sequence comprises a DNA 
sequence which differs from the naturally occurring DNA 
sequence encoding said Bacillus thuringiensis protein and 
comprises the following characteristics: 

said naturally occurring DNA sequence comprises a 
region having the following sequence: 

TTAATTAACCAAAGAATAGAAGAATTCGCTAGGAAC 
1 5 10 15 20 25 30 35 

and where said structural coding sequence comprises 
modifications so that at least said region contains at 
least one fewer sequence selected from the group 
consisting of an AACCAA and an AATTAA sequence. 

8. Hie modified chimeric gene of claim 7 wherein said 
modifications increase the number of plant preferred codons 
in said structural coding sequence. 

9. The modified chimeric gene of claim 7 wherein said 
Bacillus thuringiensis is Bacillus thuringiensis var. kurstaki. 

10. A transformed plant cell comprising a modified chi- 
meric gene which comprises a promoter which functions in 
plant cells operably linked to a structural coding sequence 
and a 3' non-translated region comprising a polyadenylation 
signal which functions in plants to cause the addition of 
polyadenylate nucleotides to the 3* end of the RNA. wherein 
said stmctural coding sequence encodes a toxin protein 
derived from a Bacillus thuringiensis protein, wherein said 
stmctural coding sequence comprises a DNA sequence 
which differs from the naturally occurring DNA sequence 
encoding said Bacillus thuringiensis protein and has char- 
acteristics comprising the following: said naturally occur- 
ring DNA sequence comprises a region having the following 
sequence: 

TTAATTAACCAAAGAATAGAAGAATrCGCTAGGAAC 
1 5 10 15 20 25 30 35 

and said structural coding sequence has been modified, said 
modifications comprising at least one modification in said 
region selected firom the group consisting of: 

nixcleotide 1 is a cytosine (C); 

nucleotide 3 is a cytosine (C); 

nucleotide 6 is a cytosine (C); 

nucleotide 12 is a guanine (G); 

nucleotide 18 is a cytosine (C); 

nucleotide 24 is a guanine (G); and 

nucleotide 36 is a thymine (T). 

U. A transformed plant cell comprising a modified chi- 
meric gene which comprises a promoter which functions in 
plant cells operably linked to a structural coding sequence 
and a 3' non-translated region comprising a polyadenylation 
signal which functions in plants to cause the addition of 
polyadenylate nucleotides to the 3' end of the RNA, wherein 
said structural coding sequence encodes a toxin protein 
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derived from a Bacillus thuringiensis protein, wherein said 
structural coding sequence comprises a DNA sequence 
which differs from the naturally occurring DNA sequence 
encoding said Bacillus thuringiensis protein and has char- 
acteristics comprising the following: 

said naturally occurring DNA sequence comprises a 
region having the following sequence: 

TTAATTAACCAAAGAATAGAAGAATTCGCTAGGAAC 
1 5 10 15 20 25 30 35 

and where said structural coding sequence comprises 
modifications so that at least said region contains at 
least one fewer sequence selected frt>m the group 
consisting of plant polyadenylation sequences and an 
ATTTA sequence. 
12, A transformed plant cell comprising a modified chi- 
meric gene which comprises a promoter which functions in 
plant cells operably linked to a structural coding sequence 
and a 3' non-translated region comprising a polyadenylation 
signal which functions in plants to cause the addition of 
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polyadenylate nucleotides to the 3' end of the RN A, wherein 
said structural coding sequence encodes a toxin protein 
derived from a Bacillus thuringiensis protein, wherein said 
structural coding sequence comprises a DNA sequence 
which differs from the naturally occurring DNA sequence 
encoding said Bacillus thuringiensis protein and has char- 
acteristics comprising the following: 

said naturally occurring DNA sequence comprises a 
10 region having the following sequence: 

TTAATTAACCAAAGAATAGAAGAATrCGCTAGGAAC 
1 5 10 15 20 25 30 35 

15 and where said structural coding sequence comprises 
modifications so that at least said region contains at 
least one fewer sequence selected from the group 
consisting of an AACCAA and an AATTAA sequence. 
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• • • • 

1 ATGGCTATAGAAACTGGTTACACCCCAATCGATATTTCCT 4 0 

A 1 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 80 

8 1 TGCTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGA 120 

T C 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 

161 TTGAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAG 2 00 

C C C G C G 

2 01 GAACC AAGCC ATTTCTAGATTAGAAGGACT AAGCAATCTT 2 4 0 

T 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 2 80 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 3 20 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 3 60 

3 61 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAG 4 00 

CC C C 

4 01 TATATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAG 4 4 0 

G C C CC C CC C 

4 4 1 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 4 80 

4 81 GCG ACT ATC AAT AGTCGTTAT AATG ATTTA ACTAGGCTTA 5 20 

5 21 TTGGCAACTATACAGATCATGCTGTACGCTGGTACAATAC 5 60 

5 61 GGGATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGG 600 

601 ATAAGATATAATCAATTTAGAAGAGAATTAACACTAACTG 64 0 
CGCCGC GOT 

64 1 TATTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAG 680 

581 AACGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 7 20 

FIG. 2A 
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7 21 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 7 60 

• . • » . 

7 61 TTCGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAG 800 

801 TCCACATTTG ATGGATATACTTAATAGTAT AACC ATCTAT 84 0 

841 ACGGATGCTCATAGAGGAGAATATTATTGGTCAGGGCATC 880 

C C C T C 

881 AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 
G C 

921 CACTTTTCCGCTATATGG AACTATGGGAAATGCAGCTCCA 9 60 

9 61 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

LOOl GAACATTATCGTCCACCTTATATAGAAGACCTTTTAATAT 104 0 

C 

104 1 AGGGATAAATAATCAACAACTATCTGTTCTTGACGGGACA 1080 
C C C C 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 

1121 TATACAG AAAAAGCGGAACGGTAG ATTCGCTGG ATG AAAT 1160 

1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 

12 01 AGTCATCGATTAAGCCATGTTTC AATGTTTCGTTC AGGCT 12 4 0 

1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 12 80 

1281 CTCTTGGATAC ATCGTAGTGCTG AATTTAATAATAT AATT 132 0 

G C C C C C 

1321 CCTTC ATC AC AAATTAC AC AAATACCTTTAAC AAAATCTA 13 60 

C C C AC C C G 

13 61 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 14 00 
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14 01 ATTTAC AGGAGG AGATATTCTTCG AAGAACTTCACCTGGC 14 4 0 

14 41 C AGATTTC AACCTTAAGAGTAAAT ATTACTGCACCATTAT 14 80 

14 81 C AC AAAG ATATCGGGTAAGAATTC GCTACG CTTCT ACC AC 1520 

1521 AAATTTACAATTCCATACATCAATTGACGGAAGACCTATT 1560 
CC T G C 

• • • • 

15 61 AATC AGGGGAATTTTTC AGCAACTATGAGTAGTGGG ACTA 1600 

1 601 ATTTAC AGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 164 0 

164 1 TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 1680 

1681 AGTGCTCATGTCTTC AATTC AGGC AATG AAGTTTATATAG 17 20 
1721 ATCGAATTGAATTTGTTCCGGCA 17 4 3 
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• • • • 

1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 4 0 
CCA C AC 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

8 1 AAGAATAG AAACTGGTTACACCCC AATCGATATTTCCTTG 120 
CCT C TC CC 

• • ■ • 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

• • • • 

161 CTGGATTTGTGTT AGGACTAGTTGATATAATATGGGG AAT 200 
GCTCC GCC T 

201 TTTTGGTCCCTCTCAATGGGACGC ATTTCTTGTAC AAATT 2 40 
C A T C G G 

2 41 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC GGC G C 

2 81 ACCAAGCCATTTCTAGATTAGAAGGACTAAGC AATCTTTA 320 

G C G G T G C 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 3 60 
C C T GAGC C C 

3 61 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 

C TC CC C G A 

4 01 TC AATG AC ATGAACAGTGCCCTTACAACCGCTATTCCTCT 4 4 0 

C CTGCA CAT 

4 4 1 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 4 80 
GC CGCC CGCG 

4 81 T ATGTTC AAGCTGC AAATTTACATTTATCAGTTTTG AG AG 520 

C A T C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 5 60 
C AGC G • C T 

5 61 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 

AC C CCCCT G 

601 GGCAACTATACAGATcATGCTGTaCGCTGGTACAATACGG 640 
A CCCCC TT CT 

64 1 GATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGGAT 680 
C G C T T 



FIG. 3A 
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681 AAGATATAATCAATTTAGAAG AGAATTAACACTAACTGTA 7 20 
T CCGCG GCCAT 

721 TTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAGAA 7 60. 
G C T G C C CTCC 

7 61 CGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 

CCTCT G CTC 

801 TTATAC AAACCC AGTATTAG AAAATTTTGATGGTAGTTTT 84 0 
C T TCTGCCC CC 

8 41 CGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAGTC 880 

T T T C A T C CTCC C C 

8 81 C AC ATTTG ATGG AT AT ACTT AATAGT AT AACC ATCTATAC 920 

C C CT G' C C T C 

921 GGATGCTCATAGAGGAGAATATTATTGGTCAGGGCATCAA 9 60 
C C G C TACG 

9 61 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCC AGAATTC A 1000 

C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 104 0 
CTC C C 

104 1 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 ACATTATCGTCCACCTTATATAGAAGACCTTTTAATATAG 1120 
CGT GC CC C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 

TCCCG TC A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 12 00 
G C C T T C T 

1201 TACAG AAAAAGCGG AACGGTAGATTCGCTGG ATGAAAT AC 12 4 0 
G C T CT C C 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

12 81 TC ATCG ATTAAGCC ATGTTTCAATGTTTCGTTC AGGCTTT 1320 

CCAGG CGC C CAC 

132 1 AGTAAT AGTAGTGT AAGTAT AATAAG AGCTCCTATGTTCT 13 60 
C C TCC G C C C 

13 61 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTCC 14 00 

AT G C C C 
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14 01 TTC ATCAC AAATTAC AC AAATACCTTT AACAAAATCT ACT 14 4 0 

CT CC CAGCG 

14 4 1 AATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGGAT 14 80 
C A G C 

14 81 TTAC AGG AGG AGATATTCTTCGAAG AACTTC ACCTGGCC A 1520 
C T A T 

1521 GATTTCAACCTTAAGAGTAAATATTACTGCACCATTATCA 15 60 
AGC CC TCC CTT 

1561 CAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCACAA 1600 

T C G T A A 

1601 ATTTACAATTCCATACATCAATTG'ACGGAAGACCTATTAA 164 0 
CG CCCC G C 

164 1 TCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTAAT 1680 
T C C C C TCA CCCC 

1681 TTAC AGTCCGG AAGCTTTAGGACTGTAGGTTTTACTACTC 1720 
GA C CACC C 

1721 CGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTAAG 17 60 
TC CTC CTCCCT 

• • • ■ 

17 61 TGCTCATGTCTTC AATTCAGGCAATG AAGTTTATATAGAT 1800 
C G T G C T C 

1801 CGAATTGAATTTGTTCCGGCAGAAGTAACCTTTGAGGCAG 184 0 
T G GTC T C T 

184 1 AATAT 184 5 

G C 



FIG. 3C 
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1 ATGGATAAC AATCCGAACATCAATGAATGCATTCCTTATA 4 0 
CCA C AC 

• • • * 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

8 1 AAGAATAGAAACTGGTTAC ACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 1 60 
CTGAG GCCCGCGA 

161 CTGGATTTGTGTTAGGACTAGTTG ATAT AATATGGGGAAT 2 00 
GCTCC CCC T 

2 01 TTTTGGTCCCTCTC AATGGGACGC ATTTCTTGTACAAATT 24 0 
C A T C G G 

2 4 1 G AACAGTTAATTAACC AAAG AATAGAAGAATTCGCTAGG A 280 
G GC GGC G C 

2 81 ACC AAGCC ATTTCT AGATTAGAAGGACT AAGC AATCTTTA 320 

G C G G T G C 

321 TC AAATTT ACGC AG AATCTTTTAGAGAGTGGG AAGCAGAT 3 60 
C C T GAGC C C 

3 61 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 4 00 

C TC CC C G A 

4 01 TC AATG AC ATG AAC AGTGCCCTTACAACCGCT ATTCCTCT 4 4 0 

C CTGCA CAT 

4 4 1 TTTTGC AGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 4 80 

C G C C C * J C ' J 

4 81 TATGTTCAAGCTGC AAATTT ACATTTATCAGTTTTGAG AG 5 20 

C A T C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 5 60 
C AGC . G C T 

5 61 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 

AC C CCCCT G 

60 1 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 64 0 
A CCCCC TT CT 

64 1 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 



FIG. 4A 
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681 AAGGT ATAATC AATTTAGAAGAGAATTAAC ACTAACTGTA 720 
TACCGC G GCCAT 

7 21 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

7 61 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 300 

CCCTCT G CTC 

• • • • 

801 TT ATACAAACCCAGTATTAGAAAATTTTG ATGGTAGTTTT 84 0 

C T TCTGCCC CC 

84 1 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 
TTTCATC G CTCC C C 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 
C C CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 9 60 
C C A AG G C T A C G 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C ATA CAGC C G T 

1001 CTTTTCCGCT ATATGGAACT ATGGGAAATGCAGCTCC AC A 104 0 
CTC C C 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C * T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 
CGT CGC CC C 

1121 GG ATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

1201 TACAG AAAAAGCGG AACGGTAGATTCGCTGGATG AAAT AC 124 0 
G C T CT C C 

12 4 1 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

12 81 TC ATCG ATT AAGCC ATGTTTC AATGTTTCGTTC AGGCTTT 1320 

CCAGG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 13 60 
C C TCC G C C C 

13 61 CTTGGATACATCGT AGTGCTGAATTTAATAAT ATAATTGC 14 00 

C G C C C C C 
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14 01 ATCGGATAGTATTACTC AAATCCCTGCAGTGAAGGGAAAC 14 4 0 

C 

14 41 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 14 80 

c c c c c 

14 81 CTGGTGGGG ACTTAGTTAG ATT AAAT AGTAGTGG AAAT AA 1520 
A C C C C C C 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 



15 61 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 
C A GA 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 164 0 
G T 

164 1 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1 680 

C C T C ' 

1 681 TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG 1720 
C G C C C C C 

1721 AAAGTGCC AATGCTTTTACATCTTCATTAGGTAATATAGT 17 60 

C C C C 

17 61 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

1801 G ACAG ATTTG AATTTATTCC AGTTACTGCAAC ACTCGAGG 184 0 
C G C 

■ • • . 

184 1 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

A TGCG 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 
CTGT ACGTCTACA C AGCT G ACTC G CA TG 



1921 G 1921 



FIG.4C 
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1 GAAAGAATAGAAACTGGTTACACCCCAATCGATATTTCCT 
ATGGCC T C T C C C 

• • • • 
4 1 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 

CTGAG GCCCGCGA 

8 1 TGCTGGATTTGTGTTAGGACTAGTTGATAT AATATGGGGA 
GCTCC CCC T 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 
C A T C G G 

161 TTGAAC AGTTAATTAACCAAAGAATAGAAGAATTCGCTAG 

G GC GGC G C 

2 01 GAACC AAGCC ATTTCTAGATTAGAAGGACT AAGCAATCTT 
G C G G T G C 

2 41 TATCAAATTTACGC AGAATCTTTTAGAGAGTGGGAAGCAG 
C C T GAGC C C 

2 81 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 

C TC CC C G A 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 
C C T G C A C A 

3 61 CTTTTTGC AGTTCAAAATTATCAAGTTCCTCTTTTATC AG 

TGC CGCC CGC 

4 01 TATATGTTCAAGCTGC AAATTTACATTTATCAGTTTTGAG 

G C A T C T CC CAGC GC TC 

4 41 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 
C AGC G C T 

4 81 GCGACTATC AATAGTCGTTATAATGATTTAACTAGGCTTA 

AC C CCCCT G 

521 TTGGC AACT ATAC AG ATT ATGCTGTACGCTGGTAC AATAC 
A CCCCC TT C 

5 61 GGGATTAGAACGTGTATGGGG ACCGGATTCTAG AG ATTGG 

T C G G C T T 

• • « ■ 
601 GTAAGGTATAATCAATTTAGAAGAGAATTAACACTAACTG 

ATACCGCG GCCA 

6 4 1 TATTAG ATATCGTTGCTCTGTTCCCGAATTATGATAGTAG 

T G C T GT C C CTCC 
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681 AAGATATCCAATTCGAACAGTTTCCCAATT AACAAGAGAA 7 20 
CCCTCT G CTC 

• • • ' • 

7 21 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 7 60 
C T TCTGCCC C 

7 61 TTCGAGGCTCGGCTC AGGGCATAG AAAGAAGTATTAGGAG 800 
CTTTCATC G CTCC C 

• • • • 

801 TCC AC ATTTG ATGGAT AT ACTTAACAGTATAACCATCTAT 84 0 
C C CCTG C T C 

841 ACGGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATC 880 
C CAAGG C TAG 

881 AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCC AG AATT 920 
G C C ATA CAGC C G 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 9 60 
T C T C C C 

9 61 CAAC AACGTATTGTTGCTCAACTAGGTC AGGGCGTGTATA 1000 

C TCC 

• • • • 

1001 GAACATTATCGTCCACTTTATATAGAAGACCTTTTAATAT 1040 
CGT CGC CC 

1041 AGGGATAAATAATCAACAACTATCTGTTCTTGACGGGACA 1080 
CTCCCG TC A 

• * • " 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 
G C C T T C 

1 121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 
T G C T CT C 

1161 ACCGCC ACAGAAT AACAACGTGCC ACCTAGGCAAGGATTT 1200 
C A C T C C 

1201 AGTC ATCGATT AAGCCATGTTTC AATGTTTCGTTC AGGCT 12 4 0 
TCC CA GG CGC C CA 

12 41 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 

C C C TCC G C C C 

1281 CTCTTGGATACATCGTAGTGCTGAATTTAATAATAT AATT 1320 

C G C C C C C 

1321 GCATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 1360 
C 

13 61 ACTTTCTTTTTAATGGTTCTGTAATTTC AGG ACC AGG ATT 14 00 

C C C C 
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14 01 TACTGGTGGGGACTTAGTTAGATT AAATAGTAGTGGAAAT 144 0 

C ACC CCCC 

14 41 AAC ATTC AGAATAGAGGGTATATTGAAGTTCC AATTCACT 14 80 



1481 TCCC ATCG ACATCTACCAGATATCGAGTTCGTGTACGGTA 1520 
C A GA 

• ■ • • 

1521 TGCTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGT 1560 

G T 

15 61 AATTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTA 1 600 

C C T 

1601 CGTC ATT AGAT AATCT AC AATCAAGTG ATTTTGGTT ATTT 1640 
CCG C CC C C 

1641 TGAAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATA 1680 

C C C C 

1681 GTAGGTGTTAG AAATTTTAGTGGG ACTGCAGGAGTG ATAA 1720 
G C T 

17 21 TAGAC AGATTTGAATTTATTCC AGTTACTGCAAC ACTCGA 17 60 
C C G C 

1761 GGCTGAA 1767 
G 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 
CCA C AC 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 
C C G A T C T 

a L AAGAATAGAAACTGGTTACACCCC AATCGATATTTCCTTG 
CCT C TC CC 

• • • * 
121 TCGCT AACGC AATTTCTTTTGAGTG AATTTGTTCCCGGTG 

CTGAG GCCCGCGA 

• • • » 
161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 

GCTCC CCC T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 
C A T C G G 

2 41 G AACAGTTAATTAACCAAAGAAT AGAAGAATTCGCTAGGA 
G GC GGC G C 

2 81 ACC AAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 

G C G G T G C 

321 TC AAATTTACGC AGAATCTTTTAGAGAGTGGGAAGCAG AT 
C C T GAGC C C 

• • • • 

3 61 CCTACTAATCC AGCATTAAGAGAAGAGATGCGTATTC AAT 

C TC CC C G A 

4 01 TC AATGACATG AAC AGTGCCCTTACAACCGCTATTCCTCT 

C CTGCA CAT 

4 4 1 TTTTGCAGTTC AAAATTATC AAGTTCCTCTTTTATCAGTA 
GC CGCC CGCG 

4 81 TATGTTCAAGCTGCAAATTTAC ATTTATC AGTTTTGAGAG 
C AT C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 
C AGC G C T 

561 GACTATCAATAGTCGTT ATAATG ATTTAACTAGGCTTATT 
AC C CCCCT G 

60 1 GGCAACTATAC AGATTATGCTGTACGCTGGTACAATACGG 
A CCCCC TT CT 

64 1 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 
C G G C T T A 
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• • • • 

681 AAGGT ATAATC AATTTAGAAGAG AATTAACACTAACTGTA 720 
TACCGCG GCCAT 

• » • • 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

• • • • 

7 61 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 

CCCTCT G CTC 

■ • • 

801 TTATACAAACCCAGTATTAGAAAATTTTG ATGGT AGTTTT 840 
C T TCTGCCC CC 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 
TTTCATC G CTCC C C 

8 81 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 

C C CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 
C CAAGG C TACG 

9 61 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 

C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGG AACTATGGGAAATGC AGCTCC AC A 10 4 0 
CTC C C 

• • • • 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 ACATT ATCGTCC ACTTT AT ATAG AAGACCTTTTAATAT AG 1120 
CGT CGC CC C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

1201 TACAG7VAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 12 4 0 
G C T CT C C 

12 41 CGCC AC AGAATAACAACGTGCCACCTAGGCAAGG ATTTAG 12 80 
A C T C CTC 

12 81 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 

CCAGG CGC C C- AC 

1321 AGTAATAGTAGTGT AAGTAT AAT AAG AGCTCCT ATGTTCT 13 60 
C C TCC G C C C 

13 61 CTTGG AT AC ATCGTAGTGCTG AATTTAAT AATATAATTGC 14 00 

C G C C C C C 



FI6.9B 



U.S. Patent Mar. 9, 1999 sheet 20 of 46 



5,880,275 



14 01 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 14 4 0 
C 

. • • • 

14 41 TTTCTTTTTAATGGTTCTGT7LATTTCAGGACC AGG ATTTA 14 80 
C C C C C 

• • • " 

14 81 CTGGTGGGG ACTTAGTT AGATTAAATAGTAGTGGAAATAA 15 2 0 

A C C C C C C 

• • • • 

1521 C ATTC AGAATAGAGGGTATATTG AAGTTCC AATTC ACTTC 15 60 

15 61 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 

C A GA 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 164 0 
G T 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C T C 

1681 TCATT AG AT AATCT AC AATC AAGTG ATTTTGGTTATTTTG 17 2 0 

C G C C C C ; C 

17 21 AAAGTGCC AATGCTTTTACATCTTCATTAGGTAATATAGT 17 60 

C C _ CO 

17 61 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

1801 GACAG ATTTG AATTT ATTCC AGTT ACTGC AAC ACTCG AGG 184 0 
C G C 

1841 CTG AATATAATCTGG AAAGAGCGC AGAAGGCGGTG AATGC 1880 



1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAAC AAAT 192 0 



1921 GTAACGGATTATCATATTGATC AAGTGTCC AATTT AGTTA 1960 



1961 CGTATTTATCGGATGAATTTTGTCTGG ATGAAAAGCG AGA 2000 



2001 ATTGTCCGAGAAAGTCAAAC ATGCGAAGCGACTCAGTGAT 204 0 



204 1 G AACGC AATTT ACTCCAAGATTCAAATTTCAAAG AC ATTA 2080 



2 081 ATAGGC AACC AGAACGTGGGTGGGGCGG AAGTAC AGGGAT 212 0 
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• • • • 

2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 

• • • • 

2161 GTCAC ACTATC AGGTACCTTTG ATGAGTGCTATCCAACAT 2200 

2201 ATTTGTATC AAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 

• • • • 

22 41 TACCCGTTATC AATTAAGAGGGTATATCG AAGATAGTCAA 2280 

• ■ * • 

2281 G ACTT AG AAATCT ATTT AATTCGCTAC AATGC AAAACATG 2320 

• • ■ ■ 

2 321 AAAC AGTAAATGTGCCAGGT ACGGGTTCCTT ATGGCCGCT 23 60 

23 61 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 24 00 

■ • • • 

2 4 01 CGATGCGCGCC ACACCTTGAATGGAATCCTGACTTAGATT 2 4 4 0 

24 41 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 24 80 

• • » • 

24 81 TC ATTTCTCCTT AGAC ATTGATGT AGGATGTAC AGACTTA 2520 

2521 AATG AGG ACCT AGGTGTATGGGTGATCTTTAAG ATTAAGA 2560 

■ • • • 

2561 CGCAAGATGGGC ACGCAAGACTAGGGAATCTAGAGTTTCT 2 600 

• • • • 

2 60 1 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 264 0 

2 64 1 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2 680 

2 681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 

2721 ATCTGTAG ATGCTTTATTTGTAAACTCTC AATATG ATCAA 27 60 

27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

2801 ATAAACGTGTTC ATAGC ATTCGAGAAGCTTATCTGCCTGA 2 84 0 

FI6.9D 
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2 841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

2881 G AATT AGAAGGGCGTATTTTCACTGC ATTCTCCCT ATATG 2 920 

2 921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2 960 

• • • 

2961 CTTATCCTGCTGG AACGTG AAAGGGCATGT AG ATGTAGAA 3000 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 304 0 

• • • • 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

. • • " 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 3240 

. • • ■ 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

. • • 

3281 ATCGAGGAT AT AACGAAGCTCCTTCCGTACC AGCTGATTA 332 0 

• • • • 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 33 60 

. . • • 

3361 AG AG AG AATCCTTGTG AATTT AACAGAGGGT ATAGGGATT 3400 

• ■ • * 

3 401 AC ACGCC ACTACCAGTTGGTTATGTGAC AAAAGAATTAG A 3 4 4 0 

• • • * 

3 4 41 ATACTTCCC AG AAACCG ATAAGGTATGGATTGAGATTGGA 34 80 

3 4 81 GAAACGGAAGG AAC ATTTATCGTGGAC AGCGTGGAATTAC 3 52 0 
3 521 TCCTTATGGAGGAA 3 5 34 

FI6.9E 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
CCA C AC 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

. . • • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 
GCTCC CCC T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTAC AAATT 24 0 
C A T C G G 

. • • • 

241 GAAC AGTTAATTAACCAAAGAATAGAAG AATTCGCTAGGA 280 
G GC GGC G C 

1 • ♦ • 

2 81 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 

G C G G T G C 

• • • " 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 3 60 
C C T GAGC C C 

• • • • 

3 61 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 

C TC CC C G A 

• • • • 

4 01 TCAATGACATG AACAGTGCCCTTACAACCGCTATTCCTCT 4 4 0 

C CTGCA CAT 

4 41 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 4 80 
GC CGCC CGCG 

4 81 T ATGTTCAAGCTGCAAATTTACATTTATC AGTTTTGAGAG 520 

C A T C T CC CAGC GC TC 

• • • ■ 

5 21 ATGTTTCAGTGTTTGGAC AAAGGTGGGGATTTGATGCCGC 5 60 

C AGC G C T 

• • • • 

5 61 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

60 1 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 64 0 
A CC.CCC TT CT 
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641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 

681 AAGGT ATAATC AATTTAGAAGAG AATTAACACTAACTGTA 7 20 
TACCGCG GCCAT 

7 21 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

• • • • 

7 61 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CCCTCT G CTC 

• • ■ * 

801 TTATACAAACCC AGTATT AG AAAATTTTGATGGT AGTTTT 840 
C T TCTGCCC CC 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 
TTTCATC G CTCC C C 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 
C C CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 9 60 
C CAAGG C TACG 

961 ATAATGGCTTCTCCTGT AGGGTTTTCGGGGCC AG AATTC A 1000 
C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGG AAATGC AGCTCC AC A 104 0 
CTC C C 

• • • • 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 

COT CGC CC C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

1201 TACAG AAAAAGCGGAACGGTAG ATTCGCTGG ATGAAAT AC 124 0 
G C T CT C C 

12 41 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 12 80 
A C T C CTC 

1281 TC ATCGATTAAGCC ATGTTTCAATGTTTC GTTCAGGCTTT 1320 
CCAGG CGC C CAC 

1321 AGT AATAGTAGTGTAAGTATAAT AAG AGC TCCT ATGTTCT 13 60 
C C TCC G C C C 
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1361 CTTGG ATACATCGTAGTGCTGAATTTAATAATATAATTGC 

C G C C C C C 

• • • • 
14 01 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 

C 

• • ■ • 
14 4 1 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGG ATTTA 

c c c c c 

. • • • 

14 81 CTGGTGGGG ACTTAGTTAGATTAAATAGTAGTGGAAATAA 
A C C C C C C 

. • • • 

1521 CATTCAGAATAGAGGGTATATTG AAGTTCCAATTCACTTC 



15 61 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 
C A GA 

1601 CTTCTGTAACCCCG ATTCACCTCAACGTTAATTGGGGTAA 
G T 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 

C C T C 

1681 TC ATT AG AT AATCTAC AATC AAGTG ATTTTGGTTATTTTG 
C G C C C C C 

• • • • 

1721 AAAGTGCC AATGCTTTT ACATCTTCATTAGGTAAT At AGT 

C C C C 

17 61 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 

G C T C 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 

C G C 

18 41 CTGAATATAATCTGGAAAGAGCGC AGAAGGCGGTG AATGC 



1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 

G C C C G C 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 
G C G G 

19 61 CGTATTTATCGG ATG AATTTTGTCTGGATG AAAAGCGAG A 
C CC CAGC G C 

2001 ATTGTCCG AG AAAGTCAAAC ATGCG AAGCGACTC AGTG AT 



2041 G AACGC AATTTACTCCAAGATTC AAATTTC AAAGAC ATTA 

FIG. IOC 



5,880,275 

1400 
1440 
1480 
1520 
1560 
1600 
1640 
1680 
1720 
1760 
1800 
1840 
1880 
1920 
1960 
2000 
2040 
2080 



U.S. Patent Mar. 9, 1999 Sheet 26 of 46 5,880,275 

. • • • 

2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 

2121 T ACC ATCC AAGGAGGGG ATGACGTATTT AAAGAAAATTAC 2160 
G TC G C G G C 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

2201 ATTTGTATC AAAAAATCGATGAATC AAAATTAAAAGCCTT 22 40 
CCCCGG CGCGG 

22 4 1 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 

2281 G ACTT AGAAATCT ATTT AATTCGCT AC AATGC AAAAC ATG 2320 
C C G CC C C 

2 321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

23 61 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2 4 00 
2 401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2 44 0 
244 1 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

2 4 81 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

. • • • 

2 521 AATG AGG ACCT AGGTGT ATGGGTGATCTTTAAG ATT AAG A 2 560 

25 61 CGC AAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2 600 

2 601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2 64 0 

2 641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2 680 

G G 

2 681 TGGAATGGG AAACAAAT ATCGTTTAT AAAG AGGCAAAAGA 2720 

G C C C C 

. > • * 

2 7 21 ATCTGT AG ATGCTTT ATTTGTAAACTCTCAATATG ATG AA 2 7 60 

27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2 800 
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2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 284 0 

2 841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

. • • 

2 881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2 920 

C C 

2 921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2 960 
C C CGC CCC 

. • • 

2 961 CTT ATCCTGCTGG AACGTG AAAGGGC ATGTAGATGTAG AA 3000 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 304 0 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

3121 TATGG AGAAGGTTGCGTAACCATTC ATGAGATCGAGAAC A 3160 

3161 ATAC AG ACG AACTG AAGTTTAGC AACTGCGTAG AAG AGG A 3200 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 324 0 

3241 GT AAATC AAG AAG AATACGG AGGTGCGTAC ACTTCTCGT A 32 80 

32 81 ATCG AGGATATAACGAAGCTCCTTCCGT ACC AGCTGATTA 332 0 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3 3 61 AGAGAGAATCCTTGTGAATTTAACAG AGGGTAT AGGGATT 34 00 

3 401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3 4 4 0 

3 4 4 1 ATACTTCCC AGAAACCGATAAGGTATGG ATTGAGATTGGA 3 4 80 

3 4 81 GAAACGG AAGGAACATTTATCGTGG ACAGCGTGGAATTAC 3 520 

3 521 TCCTTATGGAGGAA 35 34 

FIG. 10E 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 
CCA C AC 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 
C C G A T C T 

* • * * 
8 1 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 

CCT C TC CC 

• • ' • 
121 TCGCTAACGC AATTTCTTTTGAGTG AATTTGTTCCCGGTG 

CTGAG GCCCGCGA 

♦ • ' • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAAT ATGGGGAAT 
GCTCC CCC T 

201 TTTTGGTCCCTCTCAATGGGACGC ATTTCTTGTACAAATT 
C A T C G G 

2 41 GAAC AGTTAATTAACCAAAGAATAGAAGAATTCGCTAGG A 

G GC GGC G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 
G C G G T G C 

. • • 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 
C C T GAGC C C 

• • • • 

3 61 CCTACTAATCCAGC ATTAAGAGAAG AGATGCGTATTCAAT 

C TC CC C G A 

. . • • 

4 01 TCAATGACATGAACAGTGCCCTTAC AACCGCTATTCCTCT 

C CTGCA CAT 

4 41 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 
GC CGCC CGCG 

4 81 TATGTTC AAGCTGCAAATTTACATTTATCAGTTTTGAGAG 

C A T C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGG AC AAAGGTGGGGATTTGATGCCGC 
C AGC G C T 

5 61 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 

AC C CCCCT G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 
A CCCCC TT CT 

64 1 GATTAG AACGTGTATGGGG ACCGGATTCTAGAG ATTGGGT 
C G G C T T A 
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681 AAGGTATAATCAATTTAG AAGAGAATTAAC ACTAACTGTA 720 
TACCGCG GCCAT 

7 21 TTAGATATCGTTGCTCTGTTCCCG AATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

7 61 GATATCCAATTCGAACAGTTTCCC AATTAACAAGAGAAAT 800 
CCCTCT G CTC 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 
C T TCTGCCC CC 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 
TTTC ATC G CTCC C C 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 
C C CT G C T C 

92 1 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 
C CAAGG C TACG 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
CTC C C 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 
CGT CGC CC C 

1121 GGATAAATAATC AACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

• > • • 

1201 T ACAGAAAAAGC GG AACGGT AG ATTCGCTGGATGAAATAC 124 0 
G C T CT C C 

124 1 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
N A C T C CTC 

12 81 TC ATCG ATTAAGCC ATGTTTC AATGTTTCGTTCAGGCTTT 1320 

CCAGG CGC C CAC 

1321 AGTAAT AGTAGTGTAAGTATAATAAG AGCTCCTATGTTCT 1360 
C C TCC G C C C 

13 61 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 14 00 

C G C C C C C 
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14 01 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 14 40 
C 

14 41 TTTCTTTTTAATGGTTCTGTAATTTC AGGACC AGG ATTTA 1480 

C C C C C 

1481 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

15 61 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 

C A GA 

1 601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 164 0 
G T 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C T C 

1 681 TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG 1720 

C G C C C C C 

17 21 AAAGTGCCAATGCTTTTAC ATCTTCATTAGGTAATATAGT 17 60 

C C C C 

17 61 AGGTGTTAGAAATTTTAGTGGGACTGCAGG AGTG AT AATA 1800 
G C T C 

1801 GAC AGATTTG AATTTATTCC AGTT ACTGC AAC ACTC GAGG 1840 
C G C 

184 1 CTG AAT AT AATCTGGAAAGAGCGC AG AAGGCGGTG AATGC 1880 
GCCTG C T C 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 
CC CCCTGTCTG TC 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 1960 
TTC C C CGC 

19 61 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 
C CC TAGC G C C C C G T 

2 001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 
CC T CC T CC 

2 041 G AACGC AATTTACTCC AAGATTC AAATTTC AAAGAC ATTA 2080 
GAGCCTG CCC C 

2 081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 212 0 
C G T T C C 
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2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 
C CCTGCGGC 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCTVACAT 2200 
C CCATCC CTC 

• • • • 

2 201 ATTTGTATCAAAAAATCG ATGAATC AAAATTAAAAGCCTT 2240 
C CGG GCCC 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 
CAG CT CC CC 

• ♦ • • 

2 281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAAC ATG 2320 
CT CCGCAG CGC 

• ■ • • . • 

2 321 AAAC AGT AAATGTGCCAGGTACGGGTTCCTT ATGGCCGCT 2360 
GCG C T CC A 

2 3 61 TTCAGCCCAAAGTCCAATCGGAAAGTGTGG AGAGCCGAAT 2 4 00 
T TC C T G T C 

• • • • 

2 4 01 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2 4 4 0 
AT G G C 

2 4 41 GTTCGTGTAGGGATGGAGAAAAGTGTGCCC ATCATTCGCA 2 4 80 
CGC C G C T 

a • • • 

2 4 81 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 
C G C G T C G 

2 521 AATG AGGACCT AGGTGT ATGGGTG ATCTTT AAG ATT AAG A 25 60 

C A C C C C 

2 5 61 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2 600 
C C A T C C T 

2 601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2 640 

G C T T C 

2 641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2 680 
G A G G G GO 

2 681 TGGAATGGGAAACAAATATCGTTTATAAAG AGGC AAAAGA 2 720 
C T C CGC 

2 721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 
GCG GCG C G 

27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 
G CCCCC CCC 

2 801 ATAAACGTGTTC ATAGCATTCGAGAAGCTTATCTGCCTGA 2 84 0 
C G C T G CT 
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2 841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 
T C CT GCTCCCG 

• • • • 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 
CTGA CTC TGC 

• • • • 

2 921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2 960 
C C CGC C CC 

• • ■ " 

2 9 61 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

C CAG T T G C G G 

• • • • 

3001 G AAC AAAAC AACC AACGTTCGGTCCTTGTTGTTCCGGAAT 304 0 
G T G C G GTG 

• • • • 

3041 GGG AAGC AGAAGTGTCAC AAG AAGTTCGTGTCTGTCCGGG 3080 
T C G A A A 

3 081 TCGTGGCT ATATCCTTCGTGTC AC AGCGTACAAGGAGGGA 3120 

AA CTC GCT 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 
C T G G C C 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 
C C G T CTC C G A 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 32 40 
CC CTTCCCC 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 
GGG C AGC 

3 281 ATCGAGGATAT AACG AAGCTCCTTCCGTACCAGCTG ATT A 3 320 
CA T C T T C 

3321 TGCGTC AGTCTATG AAGAAAAATCGTATACAGATGGACGA 3360 
CCGCGG CC CA 

33 61 AGAGAG AATCCTTGTG AATTTAAC AGAGGGTAT AGGGATT 3400 
CT C CGC TC C 

3 4 01 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3 4 4 0 
A T C TCGGCT 

3 4 41 ATACTTCCC AG AAACCGATAAGGTATGG ATTGAG ATTGG A 34 80 
G TTG CAG C CT 

3 4 81 GAAACGG AAGG AACATTT ATCGTGGAC AGCGTGG AATTAC 3520 
C G C C GC T 

3521 TCCTTATGGAGGAA 3534 
T G 
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1 ATGACTGCAGATAATAATACGGAAGCACTAGATAGCTCTA 
CCCC CCCT 

• • ♦ • 
4 1 CAACAAAAGATGTCATTCAAAAAGGCATTTCCGTAGTAGG 

CTG TCGGTC TG 

• • • • 
8 1 TGATCTCCTAGGCGTAGT AGGTTTCCCGTTTGGTGGAGCG 

AC TG GTATCC C 

121 CTTGTTTCGTTTTATAC AAACTTTTTAAATACTATTTGGC 
C GAGC C CCCC 

■ • • • 
161 CAAGTGAAGACCCGTGGAAGGCTTTTATGGAACAAGTAGA 

CG T AAC G T 

201 AGCATTGATGG ATCAGAAAATAGCTGATTATGCAAAAAAT 
TCT GTA CGC 

• • * • 

2 41 AAAGCTCTTGCAGAGTTACAGGGCCTTCAAAATAATGTCG 

GTG ACC GC G 

281 AAGATT ATGTGAGTGCATTGAGTTCATGGC AAAAAAATCC 
G C C TCCAGC G G C 

. • • • 

321 TGTG AGTTCACGAAATCC ACATAGCC AGGGGCGGATAAGA 
T C CA T C A TA C 

■ • « > 

3 61 GAGCTGTTTTCTC AAGCAGAAAGTC ATTTTCGTT^TTCAA 

T C C TCC C CA A C 

• • • • 

4 01 TGCCTTCGTTTGCAATTTCTGGAT ACGAGGTTCTATTTCT 

AGC T C C T T C 

4 4 1 AACAAC ATATGCAC AAGCTGCC AACACACATTTATTTTTA 
CTC T CCGCC 

4 81 CTAAAAGACGCTCAAATTTATGGAGAAGAATGGGGATACG 

T G C G 

521 AAAAAG AAG ATATTGCTGAATTTTATAAAAGACAACTAAA 
G GC GCCGCT T 

5 61 ACTTACGCAAGAATATACTGACC ATTGTGTCAAATGGTAT 

G C C G C C G 

601 AATGTTGGATTAGATAAATTAAGAGGTTCATCTTATGAAT 
C TCC GCC CTCCG 

64 1 CTTGGGTAAACTTTAACCGTTATCGCAGAGAGATGACATT 
G C A A CA G C 
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681 AACAGTATTAGATTTAATTGCACTATTTCC ATTGTATGAT 7 20 
GTGCCCTC C C C 

721 GTTCGGCTATACCCAAAAGAAGTTAAAACCGAATTAACAA 7 60 
GAA C G G TGCTC 

» • • • 

7 61 GAGACGTTTTAACAGATCCAATTGTCGGAGTCAACAACCT 800 
GC C T C T 

• • • • 

801 TAGGGGCTATGGAACAACCTTCTCTAATATAGAAAATTAT 840 
T T AGC C C C 

• ♦ • • 

841 ATTCGAAAACCAC ATCTATTTGACTATCTGC ATAG AATTC 880 
AG C C T C 

381 AATTTC ACACGCGGTTCC AACC AGGAT ATT ATGGAAATGA 920 
C AA T C T C 

92 1 CTCTTTCAATTATTGGTCCGGTAATTATGTTTCAACTAGA 960 
CO C CO 

• • • • 

961 CCAAGCATAGGATCAAATGATATAATCACATCTCCATTCT 1000 
T T C C C 

1001 ATGG AAATAAATCC AGTG AACCTGTAC AAAATTT AG AATT 104 0 
T C G G G CC T G 

1041 TAATGGAGAAAAAGTCTATAGAGCCGTAGCAAATACAAAT 1080 
C C C G C C C 

• * • • 

1081 CTTGCGGTCTGGCCGTCCGCTGTATATTCAGGTGTTAC AA 1120 
CTG A ATC CC 

« * • • 

1121 AAGTGG AATTTAGCC AATATAATG ATC AAACAGATGAAGC 1160 
G G TG C GC G 

1161 AAGT AC AC AAACGTACG ACTC AAAAAGAAATGTTGGCGCG 1200 
CCCGT CCTC A 

1201 GTC AGCTGGGATTCTATCGATC AATTGCCTCCAGAAAC AA 12 40 
TCT C C 

12 41 C AGATG AACCTCTAGAAAAGGGAT ATAGCC ATC AACTCAA 12 80 
C ATGG CC C T 

12 81 TT ATGTAATGTGCTTTTT AATGC AGGGT AGTAG AGG AAC A 1320 

C G C G A TCC G C 

13 21 ATCCCAGTGTTAACTTGGAC AC ATAAAAGTGTAGACTTTT 13 60 

T G C C GTCC G C 

13 61 TTAACATGATTGATTCGAAAAAAATTACAC AACTTCCGTT 14 00 
C C AGC G G C T C 
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• • • • 

14 01 AGTAAAGGCATATAAGTTAC AATCTGGTGCTTCCGTTGTC 14 40 
G G A C C C G 

• • • • 

14 41 GCAGGTCCTAGGTTTACAGGAGGAGATATC ATTCAATGCA 1480 

CACT TC CG* 

• m • • 

1481 CAGAAAATGGAAGTGCGGCAACTATTTACGTTACACCGGA 1520 
GCCCAT C G T 

■ • • • 

1521 TGTGTCGTACTCTCAAAAATATCGAGCTAGAATTCATTAT 1560 
T G G CA G AC T C 

• • • • 

15 61 GCTTCTACATCTCAGATAACATTTAC ACTCAGTTTAGACG 1 600 

A CAGC C C C C G T 

• • • • 

1601 GGGCACCATTTAATCAAT ACTATTTCGATAAAACGATAAA 1640 
A CCCGTCTCGCC 

• • • • 

1641 TAAAGGAGACACATTAACGTATAATTCATTTAATTTAGCA 1 680 
C T TC C A C AGC C C G 

• • • . 

1681 AGTTTCAGC ACACC ATTCGAATTATCAGGGAATAACTTAC 1720 

T C C C C TC T 

• • • • 

1721 AAATAGGCGTCAC AGGATTAAGTGCTGGAGATAAAGTTTA 17 60 
GC CTCCCC C C 

• • • 

17 61 TATAGAC AAAATTGAATTTATTCC AGTGAAT 17 91 
C C G G C C C 
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1 ATG AATAATGTATTGAATAGTGGAAGAACAACTATTT 40 
GAC C C C CTC T C C . 

4 1 GTGATGCGTATAATGTAGTAGCCCATGATCCATTTAGTTT 80 
CCACCCGTC CC 

3 1 TGAAC ATAAATCATTAG ATACCATCC AAAAAGAATGG ATG 120 
C C GAGCC C C T T G G G 

121 GAGTGGAAAAGAAC AGATC ATAGTTTATATGTAGCTCCTG 1 60 
A C T T C CTC C C C C A 

161 TAGTCGGAACTGTGTCTAGTTTTTTGCTAAAGAAAGTGGG 200 
GT A CCCCTC GC 

• ' • ■ 

201 G AGTCTT ATTGG AAAAAGG ATATTG AGTG AATTATGGGGG 2 40 
CTC C C CTC TCC C C T 

2 41 ATAATATTTCCTAGTGGTAGTACAAATCTAATGCAAGATA 280 
C C ATC GTCC T C C 

2 81 TTTTAAGGGAGAC AGAACAATTCCTAAATCAAAGACTTAA 320 

CG C GTCCGCTC 

• • ♦ " 

3 21 TACAGAT ACCCTTGCTCGTGTAAATGC AGAATTGATAGGG 3 60 

CT TG AACCTG CT 

• . • 

3 61 CTCCAAGCGAATATAAGGGAGTTTAATCAACAAGTAGATA 400 

ACTCT CCG GC 

4 01 ATTTTTTAAACCCTACTCAAAACCCTGTTCCTTTATC AAT 4 4 0 

CCGTA GT G CTC 

4 4 1 AACTTCTTCGGTTAATACAATGC AGC AATTATTTCTAAAT 4 80 
C CGCT CCCCC 

4 81 AGATTACCCCAGTTCCAGATACAAGGATACCAGTTGTTAT 520 

G T T T C C CC 

• • • * 

5 21 TATTACCTTTATTTGCACAGGCAGCCAATATGCATCTTTC 5 60 

TC T AC C T T C CT G 

5 61 TTTTATTAGAGATGTTATTCTTAATGCAGATGAATGGGGT 600 

CCACTCGCCCTC A 

• • • • 

6 01 ATTTCAGC AGCAACATTACGTACGTATCGAG ATTACCTGA 640 

C T C TC TA G A CA C T 

• • • " 

64 1 GAAATTATACAAGAGATTATTCTAATTATTGTATAAATAC 680 
GCCTCT CCC CCC 
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• • ♦ • 

681 GTATCAAACTGCGTTTAGAGGGTTAAACACCCGTTTACAC 720 
T G C C T AC C T TA GC T 

721 GATATGTTAGAATTTAGAACATATATGTTTTTAAATGTAT 7 60 
C CTGCGCC CCTCG 

• ■ ■ • 

7 61 TTGAATATGTATCCATTTGGTCATTGTTTAAATATCAGAG 800 

G C CAG AGTC C C G C 

801 TCTTATGGTATCTTCTGGCGCTAATTTATATGCTAGCGGT 840 
CTG GC AC CCC CTCT C 

8 41 AGTGGACCACAGCAGACACAATCATTTACAGCACAAAACT 880 

A T GAGC C T G 

8 81 GGCCATTTTTATATTCTCTTTTCCAAGTTAATTCGAATTA 920 

C G AGCT G C C C C 

921 TATATTATCTGGTATTAGTGGTACTAGGCTTTCTATTACC 9 60 
C TC CAG CTC G C A C C A 

9 61 TTCCCTAATATTGGTGGTTTACCGGGT AGT ACTAC AACTC 1000 

T C C AC T A CTCC C 

1001 ATTC ATTG AAT AGTGCCAGGGTTAATT ATAGCGGAGG AGT 104 0 
AGCC T CTC A G C C T T 

10 41 TTC ATCTGGTCTC ATAGGGGCGACTAATCTCAATCAC AAC 1080 
CAGC AT G T T A CT G C 

1081 TTTAATTGCAGCACGGTCCTCCCTCCTTTATCAACACCAT 1120 
C TC C T G - A C GAGC G 

1121 TTGTTAG AAGTTGGCTGGATTC AGGT AC AGATCGAGAGGG 1160 
G GTCC T CAGC T C A 

1161 CGTTGCTACCTCTACGAATTGGCAGACAGAATCCTTTCAA 1200 
A AC A C G C 

12 01 AC AACTTTAAGTTTAAGGTGTGGTGCTTTTTCAGCCCGTG 124 0 
CCTCCTC A CTA 

12 41 GAAATTCAAACTATTTCCCAGATTATTTTATCCGTAATAT 1280 
G CT CCCTAGC 

12 81 TTCTGGGGTTCCTTT AGTT ATT AG A AACG AAG ATCTAAC A 1320 

C T CCCCGT CCC 

13 21 AGACCGTTACACTATAACC AAAT AAGAAATATAGAAAGTC 1360 

CTACTTC GTGCC GTC 

13 61 CTTCGGG AACACCTGGTGGAGCACGGGCCTATTTGGT ATC 14 00 
ACTTAAT AATCCCG 



FIG.13B 



U.S. Patent 



Mar. 9, 1999 



Sheet 38 of 46 



5,880,275 



14 01 TGTGCATAAC AGAAAAAATAATATCTATGCCGCTAATGAA 14 40 
C GGCC CTCCG 

14 41 AATGGTACTATGATCCATTTGGCGCCAGAAGATTATACAG 14 80 
CC TCCTA CT 

14 81 GATTTACTATATCGCCAATACATGCCACTCAAGTGAATAA 1520 

CCCT C TC C 

1521 TC AAACTCG AACATTTATTTCTGAAAAATTTGGAAATCAA 1560 
GACCCCC GC 

15 61 GGTGATTCCTTAAGATTTGAACAAAGCAACACGACAGCTC 1600 

C GGCG TC TCA 

1601 GTTATACGCTTAGAGGGAATGGAAATAGTTACAATCTTTA 1640 
GCTTG C CC C 

• • • • 

1641 TTTAAG AGTATCTTCAATAGGAAATTCAACTATTCGAGTT 1680 
C G TAGC CTTCCCCT 

1681 ACTATAAACGGTAGAGTTT ATACTGTTTC AAATGTTAATA 1720 
CC ACT CACT GC 

17 21 CCACTACAAATAACGATGGAGTTAATGATAATGGAGCTCG 17 60 
TAGCT C CCC CA 

17 61 TTTTTCAGATATTAATATCGGTAATATAGTAGCAAGTGAT 1800 

A CAGC CCCTCCCG CTC C 

18 01 AATACTAATGT AACGCTAGATATAAATGTGAC ATTAAACT 184 0 

C CTTTGCC CCCT 

18 41 CCGGTACTCCATTTGATCTCATGAATATTATGTTTGTGCC 1880 
T A C C 

18 81 AACTAATCTTCCACCACTTTAT 1902 
C C T T G C 
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• • • • 

1 ATGGAGGAAAAT AATCAAAATCAATGC ATACCTTACAATT 4 0 
G C C C T A C 

• • • • 

4 1 GTTTAAGTAATCCTGAAGAAGTACTTTTGGATGGAGAACG 80 
CG CA GTGCT 

» • • • 

a 1 GATATCAACTGGTAATTCATCAATTGATATTTCTCTGTCA 120 
CT C CTCCCCCT C 

121 CTTGTTCAGTTTCTGGTATCTAACTTTGTACCAGGGGGAG 1 60 
T G C CAGC C G T T 

161 GATTTTTAGTTGGATTAATAGATTTTGTATGGGGAATAGT 200 
GCCTC C TCCC TC 

201 TGGCCCTTCTC AATGGGATGCATTTCTAGTACAAATTGAA 2 4 0 
T A C G G G 

• • • • 

2 41 CAATTAATTAATGAAAGAATAGCTGAATTTGCTAGGAATG 280 

G G C C G G C GCC C 

281 CTGCTATTGCT AATTTAG AAGG ATTAGG AAACAATTTC AA 320 
CC CG GCTC 

321 T ATATATGTGGAAGCATTTAAAG AATGGGAAGAAGATCCT 3 60 
CC GCC G GC 

• . ■ • 

3 61 AATAATCCAGAAACCAGGACCAGAGTAATTGATCGCTTTC 400 

C G CCTGGCCAACA 

401 GTATACTTGATGGGCTACTTGAAAGGGAC ATTCCTTCGTT 4 40 
ACTG C C CTG G AT C AC 

441 TCGAATTTCTGGATTTGAAGTACCCCTTTTATCCGTTTAT 4 80 
CA C CC TTCG GC 

4 81 GCTCAAGCGGCCAATCTGCATCTAGCTATATTAAGAGATT 520 

AT T C C CC TC CA 

521 CTGTAATTTTTGGAGAAAGATGGGGATTGACAACGATAAA 5 60 
GCC G G CTC 

5 61 TGTCAATGAAAACTATAATAGACTAATTAGGCATATTGAT 600 

C GTCC TC C C 

601 GAATATGCTGATCACTGTGCAAATACGTATAATCGGGGAT 640 

GCCC TCCCCTC 

641 TAAATAATTTACCGAAATCTACGTATCAAGATTGGATAAC 680 
GCCCTG T T 

681 ATATAATCGATTACGGAGAGACTTAACATTGACTGTATTA 7 20 
C C CA G GA G CC C A T G 
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7 21 GATATCGCCGCTTTCTTTCCAAACTATGACAATAGGAGAT 
C T A C G C 

• • • • 

7 61 ATCCAATTCAGCCAGTTGGTCAACTAACAAGGGAAGTTTA 
CTCA G TCA C 

801 TACGGACCCATT AATTAATTTTAATCC AC AGTTAC AGTCT 
T CT C CC T G AAG 

841 GTAGCTC AATTACCTACTTTTAACGTTATGGAGAGCAGCC 
CCCTCAC C TC 

881 GAATTAGAAATCCTC ATTTATTTG ATATATTGAATAATCT 
TCGCACG CC CC 

921 TACAATCTTTACGGATTGGTTTAGTGTTGGACGCAATTTT 
T CC CC GTCC 

961 TATTGGGGAGGACATCGAGTAATATCTAGCCTTATAGGAG 
T CA G C C CTCT T 

1001 GTGGT AAC ATAACATCTCCTATATATGG AAGAGAGGCG AA 
G T C C C T A 

104 1 CCAGGAGCCTCC AAGATCCTTTACTTTTAATGGACCGGTA 
A C TAGT C C C C T A C 

1081 TTTAGGACTTTATCAAATCCTACTTTACGATTATTAC AGC 
CACGTC CGA GCC 

« » ■ • 

1121 AACCTTGGCCAGCGCCACCATTTAATTTACGTGGTGTTG A 

T T C CC TA A 

1161 AGGAGTAGAATTTTCTACACCTACAAATAGCTTTACGTAT 
G C T G C T C CTC C T C 

12 01 CGAGGAAGAGGTACGGTTG ATTCTTTAACTGAATTACCGC 
A T AC CGCCCA 

12 41 CTGAGG ATAATAGTGTGCC ACCTCGCGAAGG ATATAGTC A 
A C C CA G C CTCC 

12 81 TCGTTT ATGTC ATGC AACTTTTGTTC AAAGATCTGG AAC A 

CAGGCC CCGGCTC T 

1321 CCTTTTTTAAC AACTGGTGT AGT ATTTTCTTGG ACC GATC 
ACCCTAATGCA T 

13 61 GTAGTGC AACTCTTACAAATACAATTGATCCAGAGAGAAT 

T C T C C G 
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1401 TAATCAAATACCTTTAGTGAAAGGATTTAGAGTTTGGGGG 14 40 
C CAGCGTCCTG A 

14 41 GGC ACCTCTGTC ATTAC AGG AC CAGGATTT AC AGGAGGGG 1480 
AT C C C T 

14 81 ATATCCTTCGAAGAAATACCTTTGGTGATTTTGTATCTCT 1520 

T A C T C. C GAGC 

1521 AC AAGTC AATATT AATTC ACCAATTACCCAAAGATACCGT 15 60 
C TCCCT T T 

15 61 TTAAGATTTCGTTACGCTTCCAGTAGGGATGCACGAGTTA 1 600 

C C G A TTCCC T C TA C 

1601 TAGTATTAACAGGAGCGGCATCCACAGGAGTGGGAGGCCA 1640 
CGCCCCATTCTCTA 

1641 AGTTAGTGTAAATATGCCTCTTCAGAAAACTATGGAAATA 1680 
CTCC G C AC G .G C 

1681 GGGGAG AACTT AAC ATCT AGAAC ATTTAGATATACCG ATT 1720 
C GCGCCCC 

17 21 TTAGTAATCCTTTTTCATTTAGAGCTAATCCAGATATAAT 17 60 
CTC C CAGT CC T C C T C C 

17 61 TGGGATAAGTGAACAACCTCTATTTGGTGCAGGTTCTATT 1800 
CTC C AT AGC C 

1801 AGTAGCGGTGAACTTTATATAGATAA/^TTGAAATTATTC 1840 
TCATCT C TGCTCG GC 

184 1 TAGCAGATGCAACATTTGAAGCAGAATCTGATTTAGAAAG 1880 
TCCTCCCGTG ACA CC T G 

1881 AGCACAAAAGGCGGTGAATGCCCTGTTTACTTCTTCCAAT 1920 
C G T C C C CA 

1921 CAAATCGGGTTAAAAACCGATGTGACGGATTATCATATTG 1960 
GCTCG TACTTC C 

1961 ATCAAGTATCCAATTTAGTGGATTGTTTATCAGATG AATT 2000 
C G C G CACC ACC TAGC G 

2 001 TTGTCTGGATGAAAAGCG AGAATTGTCCGAGAAAGTCAAA 2040 

CCCCG TCC T 

2 04 1 CATGCGAAGCGACTCAGTGATGAGCGGAATTTACTTCAAG 2080 
CC T CCA CCTG 

2 081 ATCC AAACTTC AG AGGG ATC AATAG AC AAC C AG ACC GTGG 2120 
CT C A AC C G G A 
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■ • • • 

2121 CTGGAG AGGAAGTACAG ATATTACCATCCAAGGAGG AGAT 2160 
T.GT CCGGC CC 

2161 GACGTATTCAAAGAGAATTACGTCACACTACCGGGTACCG 2200 
TG G C CCTCATT 

2 201 TTGATG AGTGCTATCCAACGTATTTATATC AGAAAATAGA 2240 
CC CTCCGC GC 

2 2 41 TGAGTCGAAATT AAAAGCTTATACCCGTTATGAATTAAGA 2280 
C CC CTC AG CCT 

2281 GGGTATATCGAAGATAGTCAAGACTTAGAAATCTATTTGA 2320 
CC CC CT CC 

2321 TCCGTTACAATGC AAAAC ACG AAATAGTAAATGTGCCAGG 2360 
AG CG GCCG C 

2 3 61 CACGGGTTCCTTATGGCCGCTTTC AGCCCAAATGCC AATC 2 400 
T T C C A T TCT C T 

2 401 GGAAAGTGTGGAGAACCGAATCGATGCGCGCCACACCTTG 2 4 40 
G G T CA T 

2 4 41 AATGGAATCCTG ATCTAGATTGTTCCTGCAGAGACGGGGA 2 480 
G CTGCC GTC 

24 81 AAAATGTGCAC ATCATTCCC ATC ATTTC ACCTTGG ATATT 2520 
GG CC T CT CC 

2 521 GATGTTGGATGTACAGACTTAAATGAGGACTTAGGTGTAT 2560 
G TCG CCAC 

2 5 61 GGGTGATATTCAAGATTAAGACGC AAGATGGCCATGCAAG 2 600 
C C C C C A C 

2 601 ACTAGGG AATCTAGAGTTTCTCG AAGAG AAACCATTATTA 2 640 
T C C T GG C 

2 641 GGGGAAGCACTAGCTCGTGTGAAAAGAGCGGAGAAGAAGT 2 680 
T T C G A 

2 681 GGAGAGACAAACGAGAGAAACTGCAGTTGGAAACAAATAT 2 720 
G T CG A G T C 

27 21 TGTTTATAAAGAGGCAAAAGAATCTGTAG ATGCTTTATTT 2760 
C CG C GCG GC 

... • 
2 7 61 GTAAACTCTCAATATGAT AG ATTAC AAGTGGATACGAACA 2 800 
G C CAG G CC C C 

2 801 TCGCCATGATTCATGCGGCAGATAAACGCGTTCATAGAAT 2 84 0 

CCC C TGCC 
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• • • 

2841 CCGGGAAGCGTATCTGCCAGAGTTGTCTGTGATTCCAGGT 2880 
TTGTCT T C CT 

• • • • 

2 881 GTCAATGCGGCC ATTTTCGAAGAATTAGAGGGACGTATTT 2 920 
GCT C GCT C 
» • • • 

2 921 TTACAGCGTATTCCTTATATGATGCGAGAAATGTCATTAA 2 960 
CATC GC C C C 

• • • • 

2 961 AAATGGCGATTTCAATAATGGCTTATTATGCTGGAACGTG 3000 

G C T C C C CAGC T 

. ♦ • • 

3001 AAAGGTCATGTAGATGTAGAAGAGCAAAACAACCACCGTT 304 0 

GCGGAG TG 

3041 CGGTCCTTGTTATCCCAGAATGGGAGGCAGAAGTGTCACA 3080 
C GGGTG AT C 

3081 AGAGGTTCGTGTCTGTCC AGGTCGTGGCTATATCCTTCGT 3120 
A A A A C T C 

• • • • 

3121 GTCACAGCATAT AAAGAGGG ATATGGAGAGGGCTGCGTAA 3160 
GCTCG CT T G 

3161 CGATCCATGAGATCGAAGACAATACAGACGAACTGAAATT 3200 
C C GACC GTG 

• • • • 

3201 CAGC AACTGTGTAGAAGAGG AAGTATATCC AAAC AACACA 3240 
TC CCGAAC C C 

3241 GTAACGTGTAAT AATTATACTGGGACTC AAGAAGAAT ATG 3280 
TTCCGCC TA G GC 

3281 AGGGT ACGTAC ACTTCTCGTAATC AAGG AT ATGACGAAGC 3320 
GA G C AGC CAG T CA 

3321 CTATGGTAATAACCCTTCCGTACCAGCTGATTACGCTTCA 33 60 
TCC TCXXXXXXXXXXXX T T C T C C 

33 61 GTCTATGAAGAAAAATCGTATACAGATGGACGAAGAGAGA 34 00 
GCGG CC CACT 

3 4 01 ATCCTTGTGAATCTAACAG AGGCT ATGGGG ATTACAC ACC 34 40 

C C G TC T CA C 

3 4 41 ACTACCGGCTGGTTATGTAACAAAGGATTTAGAGTACTTC 34 80 
TATC TC GCT T 

3 4 81 CC AG AG ACCGAT AAGGTATGGATTGAG ATCGG AG AAAC AG 3520 
T CAGC T C 

3 521 AAGGAACATTCATCGTGGATAGCGTGGAATTACTCCTTAT 35 60 
G C C GC T T G 
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1 AGATCTAGAGGTAATTGTTATGAGTACTGTCGTGGTTAAG 4 0 

GATC 

• • • • 

4 1 GGAAACGTC AACGGTGGTGTACAAC AACCTAGAAGGAGGA 8 0 
G T A 

8 1 G AAGGCAATCCCTTCGCAGG AGGGCTAACAG AGTACAGCC 120 

T A T 

■ ■ ■ • 

121 AGTGGTTATGGTC ACTGCTCCTGGCGAACCCAGGAGGAGG 1 60 

GC A A A 

• * • • 

161 AGACGCAGAAGAGGAGGCAATCGCAGGTCAAGAAGAACTG 200 
AG T A 

■ ■ • • 

2 01 GAGTTCCC AGGGG AAGGGGCTCAAGCGAGACATTCGTGTT 240 
A AT 

241 TACAAAGGACAACCTCGTGGGCAACTCCCAAGGAAGTTTC 280 



281 ACCTTCGGACCAAGTGTATCAGACTGTCCAGCATTCAAGG 320 

T 

321 ATGGAATACTCAAGGCCTACCATGAGTACAAGATCACAAG 3 60 

T 

3 61 TATCCTTCTTCAGTTCGTCAGCGAGGCCTCTTCCACCTCA 400 

T G T 

401 CCAGGATCCATCGCTTATGAGTTGGACCCACATTGCAAAG ' 4 40 
C AT 

4 41 TATCATCCCTCCAGTCCTACGTCAACAAGTTCCAAATCAC 4 80 

T 

4 81 AAAGGGAGGAGCTAAGACCTATCAAGCTAGGATGATCAAC 520 

T T C T 

• • • • 

521 GGAGTAGAATGGC ACGATTCATCTG AGGATC AGTGCAGGA 5 60 
T T A 

5 61 TACTTTGGAAAGGAAGTGGAAAATCTTCAGACCCAGC AGG 600 

C A G T T 

601 ATCTTTCAGAGTCACCATCAGAGTGGCTCTTCAAAACCCC 64 0 

T T A 

64 1 AAGTAATAGACTCCGGATCAGAGCCTGGTCCAAGCCCACA 680 
A T 
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» • • • 

681 ACCAAC ACCC ACTCC AACTCCCC AAAAGC ATG AGCGATTT 7 20 

• • • • 

721 ATTGCTTACGTCGGCATACCTATGCTGACCATTCAAGAAT 7 60 

7 61 TC 7 62 
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SYNTHETIC PLANT GENES FROM BT 
KURSTAKI AND METHOD FOR 
PREPARATION 

This is a Continuation, of application U.S. Ser. No. 5 
08/433,111, filed on May 3, 1995, now abandoned, which is 
a divisional of U.S. Ser. No. 07/959^06, filed Oct. 9, 1992, 
now U.S Pat. No. 5,500^65, which is a continuation of U.S. 
Ser. No. 07/476,661, filed Feb. 12, 1990, now abandoned, 
which is a continuation-in-part of U.S. Ser. No. 07/315,355, 10 
filed Feb. 24, 1989, now abandoned. 

BACKGROUND OF THE INVENTION 

The present invention relates to genetic engineering and 
more particularly to plant transformation in which a plant is 
transformed to express a heterologous gene. 

Although great progress has been made in recent years 
with respect to transgenic plants which express foreign 
proteins such as herbicide resistant enzymes and viral coat 
proteins, very little is known about the major factors affect- 
ing expression of foreign genes in plants. Several potential 
factors could be responsible in varying degrees for the level 
of protein expression from a particular coding sequence. The 
level of a particular mRNA in the cell is certainly a critical 25 
factor 

The potential causes of low steady state levels of mRNA 
due to the nature of the coding sequence are many. First, full 
length RNA synthesis micas not occur at a high frequency. 
This could, for example, be caused by the premature termi- 30 
nation of RNA during transcription or due to unexpected 
mRNA processing duxing transcription. Second, full length 
RNA could be produced but then processed (splicing, polyA 
addition) in the nucleus in a fashion that creates a nonfunc- 
tional mRNA. If the RNA is properly synthesized, termi- 35 
nated and polyadenylated, it then can move to the cytoplasm 
for translation. In the cytoplasm, mRNAs have distinct half 
lives that are determined by their sequences and by the cell 
type in which they are expressed. Some RNAs are very 
short-lived and some are much more long-lived. In addition, 40 
there is an effect, whose magnitude is uncertain, of transla- 
tional.eflSciency on mRNA half -life. In addition, every RNA 
molecule folds into a particular structure, or perhaps family 
of structures, which is determined by its sequence. The 
particular structure of any RNA might lead to greater or 45 
lesser stability in the cytoplasm. Structure per se is probably 
also a determinant of mRNA processing in the nucleiis. 
Unfortunately, it is impossible to predict, and nearly impos- 
sible to determine, the structure of any RNA (except for 
tRNA) in vitro or in vivo. However, it is likely that dra- 50 
matically changing the sequence of an RNA will have a large 
effect on its folded structure. It is likely that structure per se 
or particular structural features also have a role in determin- 
ing RNA stability. 

Some particular sequences and signals have been identi- 55 
fied in RNAs that have the potential for having a specific 
effect on RNA stability. This section summarizes what is 
known about these sequences and signals. These identified 
sequences often are A+T rich, and thus are more likely to 
occur in an A+T rich coding sequence such as a B.t. gene. 60 
The sequence motif Al 1 lA (or AUUUA as it appears in 
RNA) has been implicated as a destabilizing sequence in 
mammalian cell mRNA (Shaw and Kamen, 1986). No 
analysis of the function of this sequence in plants has been 
done. Many short lived mRNAs have A+T rich 3* un trans- 65 
lalcd regions, and these regions often have the Al 11 A 
sequence, sometimes present in mutiple copies or as multi- 
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mers(e.g.,ATTTAnTA. . .). Shaw and Kamen showed that 
the transfer of the 3* end of an unstable mRNA to a stable 
RNA (globin or VAl) decreased the stable RNA's half life 
dramatically. They further showed that a pentamer of 
ATTTA had a profound destabilizing effect on a stable 
message, and that this signal could exert its effect whether 
it was located at the 3* end or within the coding sequence. 
However, the number of ATTTA sequences and/or the 
sequence context in which they occur also appear to be 
important in determining whether they function as destabi- 
lizing sequences. Shaw and Kamen showed that a trimer of 
ATTTA had much less effect than a pentamer on mRNA 
stability and a dimer or a monomer had no effect on stability 
(Shaw and Kamen, 1987). Note that multimers of ATTTA 
such as a pentamer automatically create an A+T rich region. 
This was shown to be a cytoplasmic effect, not nuclear. In 
other uinstable mRNAs, the ATTTA sequence may be present 
in only a single copy, but it is often contained in an A+T rich 
region. From the animal cell data collected to date, it appears 
that ATTTA al least in some contexts is important in stability, 
but it is not yet possible to predict which occurences of 
ATTTA are destabilizing elements or whether any of these 
effects are likely to be seen in plants. 

Some studies on mRNA degradation in animal cells also 
Indicate that RNA degradation may begin in some cases 
with nucleolytic attack in A+T rich regions. It is not clear if 
these cleavages occur at ATTTA sequences. There are also 
examples of mRNAs that have differential stability depend- 
ing on the cell type in which they are expressed or on the 
stage within the cell cycle at which they are expressed. For 
example, histone mRNAs are stable during DNA synthesis 
but unstable if DNA synthesis is disrupted. The 3' end of 
some histone mRNAs seems to be responsible for this effect 
(Pandey and Marzluff, 1987). It does not appear to be 
mediated by ATTTA, nor is it clear what controls the 
differential stability of this mRNA. 

Another example is the differential stability of IgG mRNA 
in B lymphocytes during B cell maturation (Genovese and 
Milcarek, 1988). A final example is the instability of a 
mutant beta-thallesemic globin mRNA. In bone marrow 
cells, where this gene is normally expressed, the mutant 
mRNA is unstable, while the wild-type mRNA is stable. 
When the mutant gene is expressed in HeLa or L cells in 
vitro, the mutant mRNA shows no instability (Lim et al., 
1988). These examples all orovide evidence that mRNA 
stability can be mediated by cell type or cell cycle specific 
factors. Furthermore this type of instability is not yet asso- 
ciated with specific sequences. Given these uncertainties, it 
is not possible to predict which RNAs are likely to be 
unstable in a given cell. In addition, even the ATTTA motif 
may act differentially depending on the nature of the cell in 
which the RNA is present. Shaw and Kamen (1987) have 
reported that activation of protein kinase C can block 
degradation mediated by AITTA. 

The addition of a polyadenylate string to the 3' end is 
common to most eucaryotic mRNAs, both plant and animal. 
The currently accepted view of polyA addition is that the 
nascent transcript extends beyond the mature 3' terminus. 
Contained within this transcript are signals for polyadeny- 
lation and proper 3' end formation. This processing at the 3* 
end involves cleavage of the mRNA and addition of polyA 
to the mature 3* end. By searching for consensus sequences 
near the polyA tract in both plant and animal mRNAs, it has 
been possible to identify consensus sequences that appar- 
ently are involved in polyA addition and 3' end cleavage. 
The same consensus sequences seem to be important to both 
of these processes. These signals are typically a variation on 
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the sequence AATAAA. In animal cells, some variants of 
this sequence that are functional have been identified; in 
plant cells there seems to be an extended range of functional 
sequences (Wickens and Stephenson, 1984; Dean et al., 
1986). Because all of these consensus sequences are varia- 
tions on AATAAA, they all are A+T rich sequences. This 
sequence is typically found 15 to 20 bp before the poly A 
tract in a mature mRNA. Experiments in animal cells 
indicate that this sequence is involved in both polyA addi- 
tion and 3' maturation. Site directed mutations in this 
sequence can disrupt these functions (Conway and Wickens, 
1988; Wickens et al., 1987). However, it has also been 
observed that sequences up to 50 to 100 bp 3' to the putative 
polyA signal are also required; i.e., a gene that has a normal 
AATAAA but has been retlaced or disrupted downstream 
does not get properly polyadenylated (Gil and Proudfoot, 
1984; Sadofsky and Alwine, 1984; McDevitt et al., 1984). 
That is, the polyA signal itself is not sufficient for complete 
and proper processing. It is not yet known what specific 
downstream sequences are required in addition to the polyA 
signal, or if there is a specific sequence that has this function. 
Therefore, sequence analysis can only identify potential 
polyA signals. 

In naturally occuring mRNAs that are normally 
polyadenylated, it has been observed that disruption of this 
process, either by altering the polyA signal or other 
sequences in the mRNA, profound effects can be obtained in 
the level of functional mRNA. This has been observed in 
several naturally occuring mRNAs, with results that are 
gene specific so far. There are no general rules that can be 
derived yet from the study of mutants of these natural genes, 
and no rules that can be applied to heterologous genes. 
Below are four examples: 

1. In a globin gene, absence of a proper polyA site leads 
to improper termination of transcription. It is likely, but 
not proven, that the improperly terminated RNA is 
nonfunctional and unstable (Proudfoot et al., 1987). 

2. In a globin gene, absence of a functional polyA signal 
can lead to a 100- fold decrease in the level of mRNA 
accumulation (Proudfoot et al., 1987). 

3. A globin gene polyA site was placed into the 3' ends of 
two different histone genes. The histone genes contain 
a secondary structure (stem-loop) near their 3* ends. 
The amount of properly polyadenylated histone mRNA 
produced from these chimeras decreased as the distance 
between the stem-loop and the polyA site increased. 
Also, the two histone genes produced greatly different 
levels of properly polyadenylated mRNA. This sug- 
gests an interaction between the polyA site and other 
sequences on the mRNA that can modulate mRNA 
accumulation (Pandy and Marzluff, 1987). 

4. The soybean leghemoglobin gene has been cloned into 
HeLa cells, and it has been determined that this plant 
gene contains a "cryptic" polyadenylation signal that is 
active in animal cells, but is not utilized in plant ceUs. 
This leads to the production of a new polyadenylated 
mRNA that is nonfunctional. This again shows that 
analysis of a gene in one cell type cannot predict its 
behavior in alternative cell types (Wiebauer et al., 
1988). 

From these examples, it is clear that in natural mRNAs 
proper polyadenylation is mportant in mRNA accumulation, 
and that disruption of this process can effect mRNA levels 
significantly. However, insufficient knowledge exists to pre- 
dict the effect of changes in a normal gene. In a heterologous 
gene, where we do not know if the putative polyA sites 
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(consensus sequences) are functional, it is even harder to 
predict the consequences. However, it is possible that the 
putative sites identified are disfunctional. That is, these sites 
may not act as proper polyA sites, but instead function as 
aberrant sites that give rise to unstable mRNAs- 

In animal cell systems, AATAAA is by far the most 
common signal identified in mRNAs upstream of the polyA, 
but at least four variants have also been found (Wickens and 
Stephenson, 1984). In plants, not nearly so much analysis 
has been done, but it is clear that multiple sequences similar 
to AATAAA can be used. The plant sites below called major 
or minor refer only to the study of Dean et al. (1986) which 
analyzed only three types of plant gene. The designation of 
polyadenylation sites as major or minor refers only to the 
frequency of their occurrence as functional sites in naturally 
occurring genes that have been analyzed. In the case of 
plants this is a very limited database. It is hard to predict 
with any certainty that a site designated major or minor is 
more or less likely to function partiaUy or completely when 
found in a heterologous gene such as B.t. 
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ATAAAA 
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ATGAAA 
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AAGCAT 
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ATTAAT 




PlOA 


ATACAT 




PllA 


AAAATA 




P12A 


ATTAAA 
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PI 3 A 


AATTAA 




PI 4 A 


AATACA 




PI 5 A 


CATAAA 





Another type of RNA processing that occurs in the 
nucleus is intron splicing. Nearly all of the work on intron 
Processing has been done in animal cells, but some data is 
emerging from plants. Intron processing depends on proper 
5' and 3' splice junction sequences. Consensus sequences for 
these junctions have been derived for both animal and plant 
mRNAs, but only a few nucleotides are known to be 
invariant. Therefore, it is hard to predict with any certainty 
whether a putative splice junction is functional or partially 
functional based solely on sequence analysis. In particular, 
the only invariant nucleotides are GT at the 5' end of the 
intron and AG at the 3* end of the intron. In plants, at every 
nearby posit on, either within the intron or in the exon 
flanking the intron, all four nucleotides can be foimd, 
although some positions show some nucleotide preference 
(Brovra, 1986; Hanley and Schuler, 1988). 

A plant intron has been moved from a patatin gene into a 
GUS gene. To do this, site directed mutagenesis was per- 
formed to introduce new restriction sites, and this mutagen- 
esis changed several nucleotides in the intron and exon 
sequences flanking the GT and AG. This intron still fimc- 
tioned properly, indicating the importance of the GT and AG 
and the flexibility at other nucleotide positons. There are of 
course many occurences of GT and AG in all genes that do 
not function as intron splice junctions, so there must be some 
other sequence or structrual features that identify splice 
junctions. In plants, one such feature appears to be base 
composition per se. Wiebauer et al. (1988) and Goodall et al. 
(1988) have analyzed plant introns and exons and found that 
exons have -50% A+T while introns have —70% A+T. 
Goodall et al. (1988) also created an artificial plant intron 
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that has consensus 5* and 3* splice junctions and a random 
A+T rich internal sequence. This intron was spliced cor- 
rectly in plants. When the internal segment was replaced by 
a G+C rich sequence, splicing eflSciency was drastically 
reduced. These two examples demonsatrale that intron rec- 
ognition in plants may depend on very general features — 
splice junctions that have a great deal of sequence diversity 
and A+T richness of the intron itself. This, of course, makes 
it difficult to predict from sequence alone whether any 
particular sequence is likely to function as an active or 
partially active intron for RNA processing. 

B,t. genes being A+T rich contain nuimerous stretches of 
various lengths that have 70% or greater A+T. The number 
of such stretches identified by sequence analysis depends on 
the length of sequence scanned. 

As for polyadenylation described above, there are com- 
plications in predicting what sequences might be utilized as 
splice sites in any given gene. First, many naturally occuring 
genes have alternative splicing pathways that create alter- 
native combinations of exons in the final mRNA (Gallega 
and Nadal-Ginard, 1988; Helfman and Ricci, 1988; Tsu- 
rushita and Korn, 1989). That is, some splice jimctions are 
apparently recognized under some circumstances or in cer- 
tain ceU types, but not in others. The rules governing this are 
not understood. In addition, there can be an interaction 
between processing paths such that utilization of a particular 
polyadenylation site can interfere with splicing at a nearby 
splice site and vice versa (Adami and Nevins, 1988; Brady 
and Wold, 1988; Marzluff and Pandey, 1988). Again no 
predictive rules are available. Also, sequence changes in a 
gene can drastically alter the utilization of particular splice 
junctions. For example, in a bovine growth hormone gene, 
small deletions in an exon a few hundred bases downstream 
of an intron cause the splicing efficiency of the intron to drop 
from greater than 95% to less than 2% (essentially 
nonfunctional). Other deletions however have essentially no 
effect (Hampson and Rottman, 1988). Finally, a variety of in 
vitro and in vivo experiments indicate that mutations that 
disrupt normal splicing lead to rapid degradation of the RNA 
in the nucleus. Splicing is a multistep process in the nucleus 
and mutations in normal splicing can lead to blockades in the 
process at a variety of steps. Any of these blockades can then 
lead to an abnormal and imstable RNA. Studies of mutants 
of normally processed (polyadenylation and splicing) genes 
are relevant to the study of heterologous genes such as B.t. 
B.t. genes might contain functional signals that lead to the 
production of aberrant nonfunctional mRNAs, and these 
mRNAs are likely to be unstable. But the B.r. genes are 
perhaps even more likely to contain signals that are analo- 
gous to mutant signals in a natural gene. As shown above 
these mutant signals are very likely to cause defects in the 
processing pathways whose consequence is to produce 
unstable mRNAs. 

It is not known with any certainty what signals RNA 
transcription termination in plant or animal cells. Some 
studies on animal genes that indicate that stretches of 
sequence rich in T cause termination by calf thymus RNA 
polymerase II in vitro. These studies have shown that the 3' 
ends of in vitro terminated transcripts often lie within runs 
of T such as T5, T6 or T7. Other identified sites have not 
been composed solely of T, but have had one or more other 
nucleotides as well. Termination has been found to occur 
within the sequences TATTTTTT, ATTCTC, TTCTT 
(Dedrick et al., 1987; Reines et ah, 1987). In the case of 
these latter two, the context in which the sequence is found 
has been C+T rich as well. It is not known if this is essential. 
Other studies have implicated stretches of A as potential 



transcriptional terminators. An interesting example from 
SV40 illustrates the uncertainty in defining terminators 
based on sequence alone. One potential terminator in SV40 
was identified as being A rich and having a region of dyad 
5 symmetry (potential stem-bcoo) 5' to the A rich stretch. 
However, a second terminator identified experimentally 
downstream in the same gene was not A rich and included 
no potential secondary structure (Kessler et al., 1988) Of 
course, due to the A+T content olB.i. genes, they are rich in 
10 runs of A or T that could act as terminators. The importance 
of termination to stability of the mRNA is shown by the 
globin gene example described above. Absence of a normal 
polyA site leads to a failure in proper termination with a 
consequent decrease in mRNA. 
15 TTiere is also an effect on mRNA stability due to the 
translation of the mRNA. Premature translational termina- 
tion in human triose phosphate isomerase leads to instability 
of the mRNA (Daar et al., 1988). Another example is the 
beta-thallesemic globin mRNA described above that is spe- 
20 cificaily unstable in bone marrow cells (Lim et al., 1988). 
The defect in this mutant gene is a single base pair deletion 
at codon 44 that leads to translational termination (a non- 
sense codon) at codon 60. Compared to properly translated 
normal globin mRNA, this mutant RNA is verv unstable. 
25 These results indicate that an improperly translated mRNA 
is unstable. Other work in veast indicates that proper but 
poor translation can have an effect on mRNA levels. A 
heterologous gene was modified to convert certain codons to 
more yeast preferred codons. An overall 10-fold increase in 
30 protein production was achieved, but there was also about a 
3-fold increase in mRNA Hoekema et al., 1987). This 
indicates that more efficient translation can lead to greater 
mRNA stability, and that the effect of codon usage can be at 
the RNA level as well as the translational level. It is not clear 
from codon usage studies which codons lead to poor 
translation, or how this is coupled to mRNA stability. 

SUMMARY OF THE INVENTION 

It is an object of the present invention to provide a method 
for preparing synthetic plant genes which express their 
respective proteins at relatively high levels when compared 
to wild-type genes. It is yet another object of the present 
invention to provide synthetic plant genes which express the 
crystal protein toxin of Bacillus thuringiensis at relatively 
high levels. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIGS. lA-lB illustrate the steps employed in modifying 
a wild-type gene to increase expression efficiency in plants. 

FIGS. 2A-2C illustrate a comparison of the changes in the 
modified B.t.k. HD-1 sequence of Example 1 (lower Une) 
(SEQ ID NO:20) versus the wild-type sequence of B.t.k. 
HD-1 which encodes the crystal protein toxin (upper line). 

FIGS. 3A-3C illustrate a comparison of the changes in the 
synthetic B.t.k. HD-1 sequence of Example 2 (lower line) 
(SEQ ID NO:22) versus the wild-type sequence of B.t.k. 
HD-1 which encodes the crystal protein toxin (upper line). 

FIGS. 4A-4C illustrate a comparison of the changes in the 
synthetic B.t.k. HD-73 sequence of Example 3 (lower line) 
(SEQ ID NO:23) versus the wild-type sequence of BJ.k. 
HD-73 (upper line). 

FIG. 5 represents a plasmid map of intermediate plant 
65 transformation vector cassette pMON893. 

FIG. 6 represents a plasmid map of intermediate plant 
transformation vector cassette pMON900. 
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FIG. 7 represents a map for the disarmed T-DNA of A. 
tumefaciens AGO. 

FIGS. 8A-8C illustrate a comparison of the changes in the 
synthetic truncated B.t.k, HD-73 gene (Amino acids 29-615 
with an N-terminal Met- Ala) of Example 3 (lower line) 
(SEQ ID NO:21) versus the wild-type sequence of Bj.k, 
HD-73 (upper line). 

FIGS. 9A-9E illustrate a comparison of the changes in the 
synthetic/wild-type full length B.t.k. HD-73 sequence of 
Example 3 (lower line) (SEQ ID NO:24) versus the wild- 
type full-length sequence of B.t.k. HD-73 (upper line). 

FIGS. lOA-lOE illustrate a comparison of the changes in 
the synthetic/modified full length B.t.k, HD-73 sequence of 
Example 3 (lower line) (SEQ ID NO:25) versus the wild- 
type fiill-length sequence ot B.t.k. HD-73 (upper line). 

FIGS. IIA-IIE illustrate a comparison of the changes in 
the fully synthetic full-length B.t.k. HD-73 sequence of 
Example 3 (lower line) (SEQ ID NO:26) versus the wild- 
type fiill-length sequence oi B.t.k. HD-73 (upper line). 

FIGS. 12A-12C illustrate a comparison of the changes in 
the synthetic B.t.t. sequence of Example 5 (lower line) 
versus the wild-type sequence of B.t.t. which encodes the 
crystal protein toxin (upper line). 

FIG. 13A-13C illustrates a comparison of the changes in 
the synthetic B.t. P2 sequence of Example 6 (lower line) 
versus the wild-type sequence of B.t.k. HD-1 which encodes 
the P2 protein toxin (upper line). 

FIGS. 14A-14E illustrate a comparison of the changes in 
the synthetic B.t. entomocidus sequence of Example 7 
(lower line) versus the wild-type sequence of B.t. entomoci- 
dus which encodes the Blent protein toxin (upper line). 

FIG. 15 illustrates a plasmid map for plant expression 
cassette vector pMON744. 

FIG. 16A-16B illustrate a comparison of the changes in 
the synthetic potato leaf roll virus (PLRV) coat protein 
sequence of Example 9 (lower line) versus the wild-type 
coat protein sequence of PLRV (upper line). 

DETAILED DESCRIPTION OF THE 
INVENTION 

The present invention provides a method for preparing 
synthetic plant genes which genes express their protein 
product at levels significantly higher than the wild-type 
genes which were commonly employed in plant transfor- 
mation heretofore. In another aspect, the present invention 
also provides novel synthetic plant genes which encode 
non-plant proteins. 

For brevity and clarity of description, the present inven- 
tion will be primarily described with respect to the prepa- 
ration of synthetic plant genes which encode the crystal 
protein toxin of Bacillus thuringiensis (B.t.). Suitable BJ. 
subspecies include, but are not limited to, B.t. kurstaki 
HD-1, B.t. kurstaki HD-73, B.t. sotto, B.t. berliner, B.t. 
thuringiensiSj B.t. tolworthij B.t. dendrolimus, B.t. alesti, BJ. 
galleriae, B.t. aizawai, B.t. subtoxicus, B.t. entomocidus, B.t. 
tenebrionis andB.f. san diego. However, those skilled in the 
art will recognize and it should be understood that the 
present method may be used to prepare synthetic plant genes 
which encode non-plant proteins other than the crystal 
protein toxin of B.t. as well as plant proteins (see for 
instance. Example 9). 

The expression of B.t. genes in plants is problematic. 
Although the expression of B.t. genes in plants at insecti- 
cidal levels has been reported, this accomplishment has not 
been straightforward. In particular, the expression of a 
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full-length lepidopteran specific B.t. gene (comprising DNA 
from Si B.t.k. isolate) has been reported to be unsuccessful in 
yielding insecticidal levels of expression in some plant 
species (Vaeck et al., 1987 and Barton et al., 1987). 
5 It has been reported that expression of the full-length gene 
from B.t.k. HD-1 was detectable in tomato plants but that 
truncated genes led to a higher frequency of insecticidal 
plants with an overall higher level of expression. Truncated 
genes of B.t. berliner also led to a higher frequency of 
insecticidal plants in tobacco (Vaeck et al., 1987). On the 
other hand, insecticidal plants were provided from lettuce 
transform ants using a full-length gene. 

It has also been reported that the full length gene from 
B.t.k. HD-73 gave some nsecticidal effect in tobacco (Adang 
et al.. 1987). However, the B.t. mRNA detected in these 
plants was only 1.7 kb compared to the expected 3.7 kb 
indicating improper expression of the gene. It was suggested 
that this truncated mRNA was too short to encode a func- 
tional truncated toxin, but there must have been a low level 
of longer mRNA in some plants or no insecticidal activity 
would have been observed. Others have reported in a 
publication that they observed a large amount of shorter than 
expected mRNA from a truncated B.t.k. gene, but some 
mRNA of the expected size was also observed. In fact, it was 
suggested that expression of the fiill length gene is toxic to 
tobacco callus (Barton et al., 1987). 

The above illustrates that lepidopteran type B.t. genes are 
poorly expressed in plants compared to other chimeric genes 
previously expressed from the same promoter cassettes. 
3Q The expression of B.t.t. in tomato and potato is at levels 
similar to that of B.t.k. (i.e., poor). B.t.t. and B.t.k. genes 
share only limited sequence homology, but they share many 
common features in terms of base composition and the 
presence of particular A+T rich elements. 
35 All reports in the field have noted the lower than expected 
expression of B.t. genes in plants. In general, insecticidal 
eflBcacy has been measured using insects very sensitive to 
B.t. toxin such as tobacco homworm. Although it has been 
possible to obtain plants totally protected against tobacco 
40 hornworm, it is important to note that homworm is up to 500 
fold more sensitive to B.t. toxin than some agronomically 
important insect pests such as beet armyworm. It is therefore 
of interest to obtain transgenic plants that are protected 
against all important lepidopteran pests (or against Colorado 
45 potato beetle in the case of B.t. tenebrionis), and in addition 
to have a level of B.t. expression that provides an additional 
safety margin over and above the efficacious protection 
level. It is also important to devise plant genes which 
function reproducibly from species to species, so that insect 
50 resistant plants can be obtained in a predictable fashion. 

In order to achieve these goals, it is important to under- 
stand the nature of the poorer than expected expression of 
B.t. genes in plants. The level of stable B.t. mRNA in plants 
is much lower than expected. That is, compared to other 
55 coding sequences driven by the same promoter, the level of 
B.t. mRNA measured by Northern analysis or nuclease 
protection experiments is much lower. For example, tomato 
plant 337 (FischhoflF et al., 1987) was selected as the best 
expressing plant with pMON9711 which contains the B.t.k. 
60 HD-1 Kpnl fragment driven by the CaMV 35S promoter and 
contains the NOS-NPTII-NOS selectable marker gene. In 
this plant the level of B.t. mRNA is between 100 to 1000 fold 
lower than the level of NPTII mRNA, even though the 35S 
promoter is approximately 50-fold stronger than the NOS 
65 promoter (Sanders et al., 1987). 

The level of B.t. toxin protein detected in plants is 
consistent with the low level of B.t. mRNA. Moreover, the 
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insecticidal efiBcacy of the transgenic plants correlates with 
the B.t. protein level indicating that the toxin protein pro- 
duced in plants is biologically active. Therefore, the low 
level of Bj. toxin expression may be the result of the low 
levels of mRNA. 5 

Messenger RNA levels are determined by the rate of 
synthesis and rate of degradation. It is the balance between 
these two that determines the steady state level of mRNA, 
The rate of synthesis has been maximized by the use of the 
CaMV 35S promoter, a strong constitutive plant expressible jq 
promoter. The use of other plant promoters such as nopaline 
synthase (NOS), mannopine synthase (MAS) and ribulose 
bisphosphatecarboxylase small subunit (RUBISCO) have 
not led to dramatic changes in the levels of B.t. toxin protein 
expression indicating that the effects determining B.t. toxin 
protein levels are promoter independent. These data imply 
that the coding sequences of DNA genes encoding 5./. toxin 
proteins are somehow responsible for the poor expression 
level, and that this effect is manifested by a low level of 
accumulated stable mRNA. 20 

Lower than expected levels of mRNA have been observed 
with four different lepidopteran specific genes (two from 
B.t.k. HD-1; B.t. berliner and B.t.k. HD-73) as well as the 
gene from the coleopteran specific B.t. tenebrionis. It 
appears that for lepidopteran type B.t. genes these effects are 25 
manifest more strongly in the full length coding sequences 
than in the truncated coding sequences. These effects are 
seen across plant species although their magnitude seems 
greater in some plant species such as tobacco. 

The nature of the coding sequences of B.t. genes distin- 30 
guishes them firom plant genes as well as many other 
heterologous genes expressed in plants. In particular, B.t. 
genes are very rich (-62%) in adenine (A) and thymine (T) 
while plant genes and most bacterial genes which have been 
expressed in plants are on the order of 45-55% A+T The 35 
A+T content of the genomes (and thus the genes) of any 
organism are features of that organism and reflect its evo- 
lutionary history. While within any one organism genes have 
similar A+T content, the A+T content can vary tremendously 
from organism to organism. For example, some Bacillus 40 
species have among the most A+T rich genomes while some 
Steptomyces species are among the least A+T rich genomes 
(-30 to 35% A+T). 

Due to the degeneracy of the genetic code and the limited 
number of codon choices for any amino acid, most of the 45 
"excess" A+T of the structural coding sequences o some 
Bacillus species are found in the third position of the codons. 
That is, genes of some Bacillus species have A or T as the 
third nucleotide in many codons. Thus A+T content in part 
can determine codon usage bias. In addition, it is clear that 50 
genes evolve for maximum function in the organism in 
which they evolve. This means that particular nucleotide 
sequences found in a gene from one organism, where they 
may play no role except to code for a particular stretch of 
amino acids, have the potential to be recognized as gene 55 
control elements in another oraanism (such as transcriptional 
promoters or terminators, polyA addition sites, intron splice 
sites, or specific mRNA degradation signals). It is perhaps 
surprising that such misread signals are not a more common 
feature of heterologous gene expression, but this can be 60 
explained in part by the relatively homogeneous A+T con- 
tent (-50%) of many organisms. This A+T content plus the 
nature of the genetic code put clear constraints on the 
likliehood of occurence of any particular oligonucleotide 
sequence. Thus, a gene from E. coli with a 50% A+T content 65 
is much less likely to contain any particular A+T rich 
segment than a gene from B. thuringiensis. 



As described above, the expression of B.t. toxin protein in 
plants has been problematic. Although the observations 
made in other systems described above offer the hope of a 
means to elevate the expression level of B.t. toxin proteins 
in plants, the success obtained by the present method is quite 
unexpected. Indeed, inasmuch as it has been recently 
reported that expression of the full-length B.f./r. toxin protein 
in tobacco makes calltis tissue necrotic (Barton et al., 1987); 
one would reasonably expect that high level expression of 
B.t. toxin protein to be unattainable due to the reported 
toxicity effects- 

In its most rigorous application, the method of the present 
invention involves the modification of an existing structural 
coding sequence ("structiu-al gene") which codes for a 
particular protein by removal of ATTTA sequences and 
putative polyadenylation signals by site directed mutagen- 
esis of the DNA comprising the structural gene. It is most 
preferred that substantially aU the polyadenylation signals 
and ATTTA sequences are removed although enhanced 
expression levels are observed with only partial removal of 
either of the above identified sequences. Alternately if a 
synthetic gene is prepared which codes for the expression of 
the subject protein, codons are selected to avoid the ATTTA 
sequence and putative polyadenylation signals. For purposes 
of the present invention putative polyadenylation signals 
include, but are not necessarily limited to, AATAAA, 
AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, 
ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, 
AAAATA, ATTAAA, AAFTAA, AATACA and CATAAA. 
In replacing the ATTTA sequences and polyadenylation 
signals, codons are preferably utilized which avoid the 
codons which are rarely found in plant genomes. 

Another embodiment of the present invention, repre- 
sented in the flow diagram of FIG. 1, employs a method for 
the modification of an existing structural gene or alternately 
the de novo synthesis of a structural gene which method is 
somewhat less rigorous than the method first described 
above. Referring to FIG. 1, the selected DNA sequence is 
scanned to identify regions with greater than four consecu- 
tive adenine (A) or thymine (T) nucleotides. The A+T 
regions are scanned for potential plant polyadenylation 
signals. Although the absence of five or more consecutive A 
or T nucleotides eliminates most plant polyadenylation 
signals, if there are more than one of the minor polyadeny- 
lation signals identified within ten nucleotides of each other, 
then the nucleotide sequence of this region is preferably 
altered to remove these signals while maintaining the origi- 
nal encoded amino acid sequence. 

The second step is to consider the 15 to 30 nucleotide 
regions surrounding the A+T rich region identified in step 
one. If the A+T content of the surrounding region is less than 
80%, the region should be examined for polyadenylation 
signals. Alteration of the region based on polyadenylation 
signals is dependent upon (1) the number of polyadenylation 
signals present and (2) presence of a major plant polyade- 
nylation signal. 

The extended region is examined for the presence of plant 
polyadenylation signals. The polyadenylation signals are 
removed by site -directed mutagenesis of the DNA sequence. 
The extended region is also examined for multiple copies of 
the ATTTA sequence which are also removed by mutagen- 
esis. 

It is also preferred that regions comprising many con- 
secutive A+T bases or G+C bases are disrupted since these 
regions are predicted to have a higher likelihood to form 
hairpin structure due to self-complementarity. Therefore, 



5,880,275 



11 



insertion of heterogeneous base pairs would reduce the 
likelihood of self-complementary secondary structure for- 
mation which are known to inhibit transcription and/or 
translation in some organisms. In most cases, the adverse 
effects may be minimized by using sequences which do not 
contain more than five consecutive A+T or G+C. 

SYNTHETIC OLIGONUCLEOTIDES FOR 
MUTAGENESIS 

The oligonucleotides used in the mutagenesis are 
designed to maintain the proper amino acid sequence and 
reading frame and preferably to not introduce common 
restriction sites such as Bglll, Hindlll, Sad, Kpnl, EcoRI, 
Ncol, PstI and Sail into the modified gene. These restriction 
sites are found in multi-Linker insertion sites of cloning 
vectors such as plasmids pUCllS and pMON7258- Of 
course, the introduction of new polyadenylation signals, 
ATTTA sequences or consecutive stretches of more than five 
A+T or G+C, should also be avoided. The preferred size for 
the oligonucleotides is around 40-50 bases, but fragments 
ranging from 18 to 100 bases have been utilized. In most 
cases, a minimum of 5 to 8 base pairs of homology to the 
template DNA on both ends of the synthesized fragment are 
maintained to insure proper hybridization of the primer to 
the template. The oligonucleotides should avoid sequences 
longer than five base pairs A+T or G+C. Codons used in the 
replacement of wild-type codons should preferably avoid the 
TA or CG doublet wherever possible. Codons are selected 
from a plant preferred codon table (such as Table I below) 
so as to avoid codons which are rarely found in plant 
genomes, and efforts should be made to select codons to 
preferably adjust the G+C content to about 50%. 

TABLE I 

Preferred Codon Usaoe in Plants 



10 



15 



20 



25 



30 



35 



Percent Usage 



Amino Acid 


Codon 


in Plai 


ARG 


CGA 


7 




CGC 


11 




CGG 


5 




CGU 


25 




AGA 


29 




AGG 


23 


LEU 


CUA 


8 




cue 


20 




CUG 


10 




CUU 


28 




UUA 


5 




UUG 


30 


SER 


UCA 


14 




UCC 


26 




UCG 


3 




UCU 


21 




AGC 


21 




AGU 


15 


THR 


ACA 


21 




ACC 


41 




ACG 


7 




ACU 


31 


PRO 


CCA 


45 




CCC 


19 




CCG 


9 




ecu 


26 


ALA 


GCA 


* 23 




GCC 


32 




GCG 


3 




GCU 


41 


GLY 


GGA 


32 




GGC 


20 




GGG 


11 



12 



TABLE I-continued 



Preferred Codon Usaoe in Plants 







Percent Usage 


Amino Acid 


Codon 


in Plants 




GGU 


37 


ILE 


AUA 


12 




AUC 


45 




AUU 


43 


VAL 


GUA 


9 




GUC 


20 




GUG 


28 




GUU 


43 


LYS 


AAA 


36 




AAG 


64 


ASN 


aac 


72 




AAU 


28 


GLN 


CAA 


64 




CAG 


36 




CAC 


65 




CAU 


35 


GLU 


GAA 


48 




GAG 


52 


ASP 


GAC 


48 




GAU 


52 


TYR 


UAC 


68 




UAU 


32 


CYS 


UGC 


78 




UGU 


22 


PHE 


UUC 


56 




UUU 


44 


MET 


AUG 


100 


TRP 


UGG 


100 



40 



45 



50 



55 



60 



Regions with many consecutive A+T bases or G+C bases 
are predicted to have a higher likelihood to rorm hairpin 
structures due to self -complementarity. Disruption of these 
regions by the insertion of heterogeneous base pairs is 
preferred and should reduce the likelihood of the formation 
of self -complementary secondary structures such as hairpins 
which are known in some organisms to inhibit transcription 
(transcriptional terminators) and translation (attenuators). 
However, it is difficult to credict the biological effect of a 
potential hairpin forming region. 

It is evident to those skilled in the art that while the above 
description is directed toward the modification of the DNA 
sequences of wild-type genes, the present method can be 
used to construct a completely synthetic gene for a given 
amino acid sequence. Regions with five or more consecutive 
A+T or G+C nucleotides should be avoided. Codons should 
be selected avoiding the TA and CG doublets in codons 
whenever possible. Codon usage can be normalized against 
a plant preferred codon usage table (such as Table I) and the 
G+C content preferably adjusted to about 50%. The result- 
ing sequence should be examined to ensure that there are 
minimal putative plant polyadenylation signals and AJ'l'l A 
sequences. Restriction sites found in commonly used clon- 
ing vectors are also preferably avoided. However, placement 
of several unique restriction sites throughout the gene is 
useful for analysis of gene expression or construction of 
gene variants. 

Plant Gene Construction 

The expression of a plant gene which exists in double - 
stranded DNA form involves transcription of messenger 
RNA (mRNA) from one strand of the DNA by RNA 
polymerase enzyme, and the subsequent processing of the 
mRNA primary transcript inside the nucleus. Tliis process- 
ing involves a 3' non- translated region which adds poly ade- 
nylate nucleotides to the 3' end of the RNA. Transcription of 



5,880,275 



13 



14 



DNA into mRNA is regulated by a region of DNA usually 
referred to as the "pronaoter." The promoter region contains 
a sequence of bases that signals RNA polynierase to asso- 
ciate with the DNA and to initiate the transcription of mRNA 
using one of the DNA strands as a template to make a 
corresponding strand of RNA. 

A number of promoters which are active in plant cells 
have been described in the Literature. These include the 
nopaline synthase (NOS) and octopine synthase (OCS) 
promoters (which are carried on tumor-inducing plasmids of 
Agrobacterium tumefaciens)j the Cauliflower Mosaic Virus 
(CaMV) 19S and 35S promoters, the light-inducible pro- 
moter from the small subunit of ribulose bis-phosphate 
carboxylase (ssRUBISCO, a very abundant plant 
polypeptide) and the mannopine synthase (MAS) promoter 
(Velten et al. 1984 and Velten & Schell, 1985). All of these 
promoters have been used to create various types of DNA 
constructs which have been expressed in plants (see e.g., 
per publication WO84/02913 (Rogers et ah, Monsanto). 

Promoters which are known or are found to cause tran- 
scriction of RNA in plant cells can be used in the present 
invention. Such promoters may be obtained from plants or 
plant viruses and include, but are not limited to, the 
CaMV35S promoter and promoters isolated from plant 
genes such as ssRUBISCO genes. As described below, it is 
preferred that the particular promoter selected should be 
capable of causing sufiBcient expression to result in the 
production of an effective amount of protein. 

The promoters used in the DNA constructs (i.e. chimerc 
plant genes) of the present invention may be modified, if 
desired, to affect their control characteristics. For example, 
the CaMV35S promoter may be ligated to the portion of the 
ssRUBISCO gene that represses the expression of 
ssRUBISCO in the absence of light, to create a promoter 
which is active in leaves but not in roots. The resulting 
chimeric promoter may be used as described herein. For 
purposes of this description, the phrase "CaMV35S" pro- 
moter thus includes variations of CaMV35S promoter, e.g., 
promoters derived by means of ligation with operator 
regions, random or controlled mutagenesis, etc. 
Furthermore, the promoters may be altered to contain mul- 
tiple "enhancer sequences" to assist in elevating gene 
expression. 

The RNA produced by a DNA construct of the present 
invention also contains a 5' non-translated leader sequence. 
This sequence can be derived from the promoter selectisd to 
express the gene, and can be specifically modified so as to 
increase translation of the mRNA. The 5* non- translated 
regions can also be obtained from viral RNA*s, from suit- 
able eukaryotic genes, or from a synthetic gene sequence. 
The present invention is not limited to constructs, as pre- 
sented in the following examples. Rather, the non-translated 
leader sequence can be part of the 5' end of the non- 
translated region of the coding sequence for the virus coat 
protein, or part of the promoter sequence, or can be derived 
from an unrelated promoter or coding sequence. In any case, 
it is preferred that the sequence flanking the initiation site 
conform to the translational consensus sequence rules for 
enhanced translation initiation reported by Kozak (1984). 

The DNA construct of the present invention also contains 
a modified or fully-synthetic structural coding sequence 
which has been changed to enhance the performance of the 
gene in plants. In a particular embodiment of the present 
invention the enhancement method has been applied to 
design modified and fully synthetic genes encoding the 
crystal toxin protein ot Bacillus thuringiensis. The structural 
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genes of the present invention may optionally encode a 
fusion protein comprising an ami no-terminal chloroplast 
transit peptide or secretory signal sequence (see for instance. 
Examples 10 and 11). 

The DNA construct also contains a 3' non-translated 
region. The 3' non-translated region contains a polyadeny- 
lation signal which functions in plants to cause the addition 
of polyadenylate nucleotides to tlie 3' end of the viral RNA. 
Examples of suitable 3' regions are (1) the 3' transcribed, 
non-translated regions containing the polyadenylation signal 
of Agrobacterium tumor-inducing (Ti) plasmid genes, such 
as the nopaline synthase (NOS) gene, and (2) plant genes 
like the soybean storage protein (7S) genes and the small 
subunit of the RuBP carboxylase (E9) gene. An example of 
a preferred 3' region is that from the 7S gene, described in 
greater detail in the examples below. 

Plant Transformation 

A chimeric plant gene containing a structural coding 
sequence of the present invention can be inserted into the 
genome of a plant by any suitable method. Suitable plants 
for use in the practice of the present invention include, but 
are not limited to, soybean, cotton, alfalfa, oilseed rape, flax, 
tomato, sugarbeet, sunflower, potato, tobacco, maize, rice 
and wheat. Suitable plant transformation vectors include 
those derived from a Ti olasmid of Agrobacterium 
tumefacienSy as well as those disclosed, e.g., by Herrera- 
EstreUa (1983), Bevan (1983), Klee (1985) and EPO pub- 
lication 120,516 (Schilperoort et al.). In addition to plant 
transformation vectors derived from the Tt or root-inducing 
(Ri) plasmids of Agrobacterzium, alternative methods can 
be used to insert the DNA constructs of this invention into 
plant cells. Such methods may involve, for example, the use 
of Liposomes, electroporation, chemicals that increase free 
DNA uptake, free DNA delivery via microprojectile 
bombardment, and transformation using viruses or pollen. 

A particularly useful H plasmid cassette vector for trans- 
formation of dicotyledonous plants is shown in FIG. 5. 
Referring to FIG. 5, the expression cassette pMON893 
consists of the enhanced CaMV35S promoter (EN 35S) and 
the 3' end including polyadenylation signals from a soybean 
gene encoding the alpha-prime subunit of beta-conglycinin. 
Between these two elements is a multilinker containing 
multicle restriction sites or the insertion of genes. 

The enhanced CaMV35S promoter was constructed as 
follows. A fragment of the CaMV35S promoter extending 
between position -343 and +9 was previously constructed in 
pUC13 by Odell et al. (1985). This segment contains a 
region identified by Odell et al. (1985) as being necessary 
for maximal expression of the CaMV35S promoter. It was 
excised as a Clal-Hindlll fragment, made blimt ended with 
DNA polymerase I (Klenow fragment) and inserted into the 
Hindi site or pUC18. This ucstream region of the 35S 
promoter was excised from this plasmid as a Hindlll-EcoRV 
fragment (extending from -343 to -90) and inserted into the 
same plasmid between the Hindlll and PstI sites. The 
enhanced CaMV35S promoter thus contains a duplication of 
sequences between -343 and -90 (Kay et al., 1987). 

The 3' end of the 7S gene is derived from the 7S gene 
contained on the clone designated 17.1 (Schuler et al., 
1982). This 3' end fragment, which includes the polyadeny- 
lation signals, extends from an Avail site located about 30 bp 
upstream of the termination codon for the beta-conglycinin 
aene in clone 17.1 to an EcoRI site located about 450 Up 
downstream of this termination codon. 

The remainder of pMON893 contains a segment of 
pBR322 which provides an origin of replication in E. coli 
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and a region for homologous recombination with the dis- 
armed T-DNA in Agrobacterium strain ACO (described 
below); the oriV region from the broad host range plasmid 
RKl; the streptonycin/spectinomycin resistance gene from 
Tn7; and a chimeric NPTIl gene, containing the CaMV35S 
promoter and the nopaline synthase (NOS) 3' end, which 
provides kanamycin resistance in transformed plant cells. 

Referring to FIG. 6, transformation vector plasmid 
pMON900 is a derivative of pMON893. The enhanced 
CaMV35S promoter of pMON893 has been replaced with 
the 1.5 kb mannopine synthase (MAS) promoter (Velten et 
al. 1984). The other segments are the same as plasmid 
pMON893. After incorporation of a DNA construct into 
clasmid vector pMCN893 or pMON900, the intermediate 
vector is introducer into A. tumefaciens strain ACO which 
contains a disarmed Ti plasmid. Cointegrate Ti plasmid 
vectors are selected and used to transform dicotyledonous 
plants. 

Referring to FIG. 7, A. tumefaciens ACO is a disarmed 
strain similar to pTiB6SE described by Fraley et al. (1985). 
For construction of ACO the starting Agrobacterium strain 
was the strain A208 which contains a nopaline-type Ti 
plasmid. The Ti plasmid was disarmed in a manner similar 
to that described by Fraley et al. (1985) so that essentially all 
of the native T-DNA was removed except for the left border 
and a few hundred base pairs of T-DNA inside the left 
border. The remainder of the T-DNA extending to a point 
just beyond the right border was replaced with a novel piece 
of DNA including (from left to right) a segment of pBR322, 

TABLE III 



Mutagenesis Primers for B.t.k. HD-l Gene 



Primer Length (bp) Sequence 



BTK185 


18 


TCCCCAGATA ADVTCAAC Sequence ID No. 1 


BTK240 


48 


GGCTTGATTC CTAGCGAACT 
CTTCOArrCT CTGOTTGATG 
AGCrOTTC Sequence ID No. 2 


BTK462 


54 


CAAAACTGAG AGGTGGAGGT 
TGGCAGCrTG AACXjTACACG 
GAGAGGAGAGGAAC Sequence ID No. 3 


BTK669 


48 


agttagtgta AGcrcrciTC 

TGAACTGGTT GTACCTGATC 
GAATCrcr Sequence ID No. 4 


BTK930 


39 


AGCCATGATC TGGTGACCGG 
ACCAGTAGTA TTCTCCTCT Sequence ID No. 5 


BTKlllO 


32 


AGTTGTTGGT TGTTGATCCC 
GATGTTAAAA GG Sequence ID No. 6 


BTK1380A 


37 


GTG ATGAAGG G ATG ATGTTG 
TTGAACrCAG CACTACG Sequence ID No. 7 


BTK1380T 


100 


CAGAAGTTCC AGAGCCAAGA 
TTAGTAGACr TGGTGAGTGG 
GATTTGGGTG ATTTGTG ATG 
AAGGGATGAT GTrGTTGAAC 
TX:AGCACTAC GATGTArCCA Sequence ID No. 8 


BTKieOO 


27 


TGATGTGTGG AACTGAAGGT 
TTGTGGT Sequence ID No. 9 



EXAMPLE 1 
Modified BJ.k. HD-l Gene 

5 Referring to FIG, 2, the wild-type B.t.k. HD-l gene is 
known to be expressed p>oorly in plants as a full length gene 
or as a truncated gene. The G+C content of the B.t.k. gene 
is low (37%) containing many A+T rich regions, potential 
polyadenylation sites (18 sites; see Table II for the list of 
IQ sequences) and numeroxis ATTTA sequences. 

TABLE II 



List of Sequences of the Potential 
Polyadenylation Signals 



20 



AATAAA* 


AAGCAT 


AATAAT* 


ATTAAT 


AACCAA 


ATACAT 


ATATAA 


AAAATA 


AATCAA 


ATTAAA** 


ATACTA 


AATTAA*- 


ATAAAA 


AATACA** 


ATGAAA 


CAXAAA-* 



*indicatcs a potential major plant polyadenylation site. 
** indicates a potential minor animal polyadenylation site. 



All Others are potential minor plant polyadenylation sites. 

Table III lists the synthetic oligonucleotides designed and 
synthesized for the site-directed mutagenesis of the B.t.k. 
HD-l gene. 



the oriV region from plasmid RK2, and the kanamycin 
resistance gene from Tn601. The pBR322 and oriV seg- 
ments are similar to the segments in pMON893 and provide 
a region of homology for cointegrate formation. 

The following examples are provided to better elucidate 
the practice of the present invention and should not be 
interpreted in any way to limit the scope of the present 
invention. Those skilled in the art will recognize that various 
modifications, truncations etc. can be made to the methods 
and genes described herein whUe not departing from the 
spirit and scope of the present invention. 



The B.t.k. HD-l gene (Bglll fragment from pMON9921 
encoding amino acids 29-607 with a Met -Ala at the 
N-terminus) was cloned into pMON7258 (pUCllS deriva- 
tive which contains a BgUI site in the multilinker cloning 
region) at the Bglll site resulting in pMON5342. The 
orientation of the B.t.k. gene was chosen so that the opposite 
strand (negative strand) was synthesized in filamentous 
phage particles for the mutagenesis. The procedure of 
Kunkle (1985) was used for the mutaaenesis using plasmid 
65 pMON5342 as starting material. 

The regions for mutagenesis were selected in the 
follcwing manner. All regions of the DNA sequence of the 
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B.t.k. gene were identified which contained five or more 
consecutive base pairs which were A or T These were 
ranked in terms of length and highest percentage of A+T in 
the surrounding sequence over a 20-30 base pair region. The 
DNA was then analysed for regions which might contain 
polyadenylation sites (see Table II above) or ATTTA 
sequences. Oligonucleotides were designed which maxi- 
mized the elimination of A+T consecutive regions which 
contained one or more polyadenylation sites or ATTTA 
sequences. Two potential plant polyadenylation sites were 
rated more critical (see Table II) based on published reports. 
Codons were selected which increased G+C content, did not 
generate restriction sites for enzymes useful for cloning and 
assembly of the modified gene (BamHI, Bglll, Sacd, Ncol, 
EcoRV) and did not contain the doublets TA or GC which 
have been reported to be infrequently found in codons in 
plants. The oligonucleotides were at least 18 bp long ranging 
up to 100 base pairs and contained at least 5-8 base pairs of 
direct homology to native sequences at the ends of the 
fragments for efficient hybridization and priming in site- 
directed mutagenesis reactions. FIG. 2 compares the wild- 
type B.t.k. HD-1 gene sequence with the sequence which 
resulted from the modifications by site-directed mutagen- 
esis. 

The end result of these changes was to increase the G+C 
content of B.t.k. gene from 37% to 41% while also decreas- 
ing the potential plant polyadenyiation sites from 18 to 7 and 
decreasing the ATTTA regions from 13 to 7. Specifically, the 
mutagenesis changes from amino (5') terminus to the car- 
boxy (3') terminus are as follows: 

BTK185 is an 18-mer used to eliminate a plant polyade- 
nylation site in the midst of a nine base pair region of A+T. 

BTK240 is a 48-mer. Seven base pairs were changed by 
this oligonucleotide to eliminate three potential polyadeny- 
lation sites (2 AACCAA, 1 AAITAA). Another region close 
to the region altered by BTK240, starting at bp 312, had a 
high A+T content (13 of 15 base pairs) and an ATTTA 
region. However, it did not contain a potential polyadeny- 
lation site and its longest string of uninterrupted A+T was 
seven base pairs. 

BTK462 is a 54-mer introducing 13 base pair changes. 
The first six changes were to reduce the A+T richness of the 
gene by replacing wild-type codons with codons containing 
G and C while avoiding the CG doublet. The next seven 
changes made by BTK462 were used to eliminate an A+T 
rich region (13 of 14 base pairs were A or T) containing two 
ATTTA regions. 

BTK669 is a 48-mer making nine individual base pair 
changes eliminating three possible polyadenylation sites 
(ATATAA, AATCAA, and AATTAA) and a single ATTTA 
site. 

BTK930 is a 39-mer designed to increase the G+C 
content and to eliminate a potential polyadenylation site 
(AATAAT — a major site). This region did contain a nine 
base pair region of consecutive A+T sequence. One of the 
base pair changes was a G to A because a G at this position 
would have created a G+C rich region (CCGG(G)C). Since 
sequencing reactions indicate that there can be difficulties 
generating sequence through G+C consecutive bases, it was 
thought to be prudent to avoid generating potentially prob- 
lematic regions even if they were problematic only in vitro. 

BTKlllO is a 32-mer designed to introduce five changes 
in the wild-type gene. One potential site (AATAAT — a 
major site) was eliminated in the midst of an A+T rich region 
(19 of 22 base pairs). 

BTK1380A and BTK1380T are responsible for 14 indi- 
vidual base pair changes. The first region (1380 A) has 17 
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55 



consecutive A+T base pairs. In this region 43 an ATTTA and 
a potential polyadenylation site (AATAAT). The lOO-mer 
(1380T) contains all the changes dictated by 1380A. The 
large size of this primer was in part an experiment to 
determine if it was feasible to utilize large oligonucleotides 
for mutagenesis (over 60 bases in length). A second con- 
sideration was that the lOO-mer was used to mutagenize a 
template which had previously been mutagenized by 1380A. 
The original primer ordered to mutagenize the region down- 
stream and adjacent to 1380A did not anneal efficiently to 
the desired site as indicated by an inability to obtain clean 
sequence utilizing the primer. The large region of homology 
of 1380T did assure proper annealing. The extended size of 
1380T was more of a convenience rather than a necessity. 
The second region adjacent to 1380A covered by 1380T has 
a high A+T content (22 of 29 bases are A or T). 

BTK1600 is a 27-mer responsible for five individual base 
pair changes. An ATTTA region and a plant polyadenylation 
site were identified and the appropriate changes engineered. 

A total of 62 bases were changed by site-directed 
mutagenesis. The G+C content increased by 55 base pairs, 
the potential polyadenylation sites were reduced from 18 to 
seven and the ATTTA sequences decreased from 13 to seven. 
The changes in the DNA sequence resulted in changes in 55 
of the 579 codons in the truncated B.t.k. gene in pMON5342 
(approximately 9.5%). 

Referring to Table IV modified B.t.k. HD-1 genes were 
constructed that contained all of the above modifications 
(pMON5370) or various subsets of individual modifications. 
These genes were inserted into pMON893 for plant trans- 
formation and tobacco plants containing these genes were 
analyzed. The analysis of tobacco plants with the individual 
modifications was undertaken for several reasons. Expres- 
sion of the wild type truncated gene in tobacco is very poor, 
resulting in infrequent identification of plants toxic to THW. 
Toxicity is defined by leaf feeding assays as at least 60% 
mortality of tobacco homworm neonate larvae with a dam- 
age rating of 1 or less (scale is 0 to 4; 0 is equivalent to total 
protection, 4 total damage). The modified HD-1 gene 
(pMON5370) shows a large increase in expression 
(estimated to be approximately 100-fold; see Table VIII) in 
tobacco. Therefore, increases in expression of the wild-type 
gene due to indidvidual modifications would be apparently 
a large increase in the frequency of toxic tobacco plants and 
the presence of detectable B.t.k. protein. Results are shown 
in the following table: 

TABLE IV 

Relative effects of Regional Modifications 
within the B.t.k. Gene 



#of 



Construct 


Position Modified 


Plants 


# of Tbxic Plants 


pMON5370 


185^40,669,930, 


38 


22 




1110,1380a + b,1600 






pMON10707 


185,240,462,669 


48 


19 


pMON10706 


930,1110,13S0a + b,1600 


43 


1 


pMON10539 


185 


55 


2 


pM0N10537 


240 


57 


17 


PMON10540 


185,240 


88 


23 


pMONl0705 


462 


47 


1 



65 



The effects of each individual oligonucleotides* changes 
on expression did reveal some overall trends. Six different 
constructs were generated which were designed to identify 
the key regions. The nine different oligonucleotides were 
divided in half by their position on the gene. Changes in the 
N-terminal half were incorporated into pMON10707 (185, 
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EXAMPLE 2 

Fully Synthetic B.Lk. HD-1 Gene 

A synthetic B,t.k, HD-1 aene was designed using the 
preferred plant codons listed in Table V below. Table V lists 
the codons and frequency of use in plant genes of dicoty- 
ledonous plants compared to the frequency of their use in the 
wild type B.t.k. HD-1 gene (amino acids 1-615) and the 
synthetic gene of this example. The total number of each 
amino acid in this segment of the gene is listed in the 
parenthesis under the amino acid designated. 
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240, 462, 669). C-terminal half changes were incorporated 
into pMON10706 (930, 1110, 1380a+b, 1600). The results 
of analysis of plants with these two constructs indicate that 
pMON10707 produces a substantial number of toxic plants 
(19 of 48). Protein from these plants is detectable by ELISA 
analysis. pMON10706 plants were rarely identified as insec- 
ticidal (1 of 43) and the Levels of B.Lk. were barely 
detectable by immunological analysis. Investigation of the 
N-terminal changes in greater detail was done with 4 pMON 
constructs; 10539 (185 alone), 10537 (240 alone), 10540 
(185 and 240) and 10705 (462 alone). The results indicate 
that the presence of the changes in 240 were required to 
generate a substantial number of toxic plants (pMON10540; 
23 of 88, pMON10537; 17 of 57). The absence of the 240 
changes resulted in a low frequency of toxic plants with low 
B.t.k. protein levels, identical to results with the wild type 
gene. These results indicate that the changes in 240 are 
responsible for a substantial increase in B.t.k. expression 
levels over an analogous wild-type construct in tobacco. 
Changes in additional regions (185, 462, 669) in conjunction 
with 240 may result in increases in B.t.k. expression (>2 
fold). However, changes at the 240 region of the N-terminal 
cortion of the gene do result in dramatic increases in 25 
expression. 

Despite the importance of the alteration of the 240 region 
in expression of modified genes, increased expression can be 
achieved by alteration of other regions. Hybrid genes, part 30 
wild-type, part synthetic, were generated to determine the 
effects of synthetic gene segments on the levels of B.t.k. 
expression. A hybrid gene was generated with a synthetic 
N-terminal third (base pair 1 to 590 of FIG. 2: to the Xbal 
site) with the C-terminal wild type B.t.k. HD-1 
(pMON5378) Plants transformed with this vector were as 
toxic as plants transformed with the modified HD-1 gene 
(pMON5370). This is consistent with the alteration of the 
240 region. However, pMON10538, a hybrid with a wild- 40 
type N-terminal third (wild type gene for the first 600 base 
pairs, to the second Xbal site) and a synthetic C-terminal last 
two-thirds (base pair 590 to 1845 of FIG. 3 was used to 
transform tobacco and resulted in a dramatic increase in 
expression. The levels of expression do not appear to be as 
high as those seen with the synthetic gene, but are compa- 
rable to tne modified gene levels. These results indicate that 
modification of the 240 segment is not essential to increased 
expression since pMON10538 has an intact 240 region. A 50 
fully synthetic gene is, in most cases, superior for expression 
levels oi B.t.k, (See Example 2.) 



45 



55 



60 



65 



20 



TABLE V 



Codon in Usage Synthetic B.t.k. HD-1 Gene 



Amino Acid 



Codon 



Percent Usage in 
PlantsAVt B.t.k./Syn 



ARG 


CGA 


7 


11 


2 




CGC 


11 


5 


5 




CGG 


5 


2 


0 




CGU 


25 


14 


27 




AGA 


29 


55 


41 




AGG 


23 


14 


25 


LEU 


CUA 


g 


16 


4 




cue 


20 


0 


20 




CUG 


10 


2 






CUU 


28 


22 


24 




UUA 


5 


50 


Q 




UUG 


30 


10 


45 


SER 


UCA 


14 


27 


5 




UCC 


26 


9 


28 




UCG 


3 


g 


Q 




UCU 


21 


19 


31 




AGC 


21 


5 


32 




AGU 


15 


31 


5 


THR 


ACA 


21 


31 


14 




ACC 


41 


19 


53 




ACG 


7 


14 


0 




ACU 


31 


36 


33 


PRO 


CCA 


45 


35 


53 




CCC 


19 


6 


12 




CCG 


9 


21 


3 




ecu 


26 


38 


32 


ALA 


GCA 


23 


38 


26 




GCC 


32 


9 


29 




GCG 


3 


3 


0 




GCU 


41 


50 


45 


GLY 


GGA 


32 


52 


45 




GGC 


20 


17 


15 




GGG 


11 


15 


5 




GGU 


37 


15 


34 


ILE 


AUA 


12 


39 


2 


y^o) 


AUC 


45 


11 


67 




AUU 


43 


50 


30 


VAL 


GUA 


9 


45 


3 




GUC 


20 


5 


16 




GUG 


28 


11 


37 




GUU 


43 


39 


45 


LYS 


AAA 


36 


100 


33 


(3) 


AAG 


64 


0 


67 


ASN 


AAC 


72 


27 


80 


(44) 


AAU 


28 


73 


20 


GLN 


CAA 


64 


77 


61 


(31) 


CAG 


36 


23 


39 


HIS 


CAC 


65 


0 


SO 


(10) 


CAU 


35 


100 


20 


GLU 


GAA 


48 


87 


50 


(30) 


GAG 


52 


13 


50 


ASP 


GAC 


48 


17 


65 


(23) 


GAU 


52 


83 


35 


TYR 


UAC 


68 


20 


72 


(25) 


UAU 


32 


80 


28 


CYS 


UGC 


78 


50. 


100 


(2) 


UGU 


22 


50 


0 


PHE 


uuc 


56 


17 


83 


(36) 


uuu 


44 


83 


17 


MET 


AUG 


100 


100 


100 


(9) 










TRP 


UGG 


100 


100 


100 


(9) 











The resulting synthetic gene lacks ATTTA sequences, 
contains only one potential polyadenylation site and has a 
G+C content of 48.5%. FIG. 3 is a comparison of the 
wild-type HD-1 sequence to the synthetic gene sequence for 
amino acids 1-615. There is approximately 77% DNA 
homology between the synthetic gene and the wild-type 
gene and 356 of the 615 codons have been changed 
(approximately 60%). 
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EXAMPLE 3 

Synthetic B.t.k. HD-73 Gene 

The crystal protein toxin from B.t.k. HD-73 exhibits a 
higher unit activity against some important agricultural 
pests. The toxin protein of HD-1 and HD-73 exhibit sub- 
stantial homology (-90%) in the N-terminal 450 amino 
acids, but differ substantially in the amino acid region 
451—615. Fusion proteins comprising amino acids 1—450 of 
HD-1 and 451-615 of HD-73 exhibit the insecticidal prop- 
erties of the wild-type HD-73. The strategy employed was to 
use the 5'- two thirds of the synthetic HD-1 gene (first 1350 
bases, up to the Sad site) and to dramatically modify the 
final 590 bases (through amino acid 645) of the HD-73 in a 
manner consistent with the algorithm used to design the 
synthetic HD-1 gene. Table VI below lists the oligonucle- 
otides used to nodify the HD-73 gene in the order used in the 
gene from 5' to 3' end. Nine oliconucleotides were used in 
a 590 base pair region, each nucleotide ranging in size from 
33 to 60 bases. The only regions left unchanged were areas 
where there were no long consecutive strings of A or T bases 
(longer than six). All polyadenylation sites and Ai'iTA sites 
were eliminated. 
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codons in the synthetic B.t.k. HD-73 differ from the analo- 
gous segment of the wild -type and HD-73 gene. 

A one base pair deletion in the synthetic HD-73 gene was 
detected in the course of sequencing the 3' end at base pair 
1890. This results in a frame-shift mutation at amino acid 
625 with a premature stop codon at amino acid 640 
(pMON5379). Table VII below compares the codon usage of 
the wild-type gene oi B.t.k. HD-73 versus the synthetic gene 
of this example for amino acids 45 1-645 and codon usage 
of naturally occurring genes of dicotyledonous plants. The 
total number of each amino acid encoded in this segment of 
the gene is round in the parentheses under the amino acid 
designation. 

TABLE VII 



TABLE VI 



Mutagenesis Primers for B.t.k. HD-73 
Length 



Primer 


(bp) 


Sequence 




73K1363 


51 


AATACTArCG 


GATGCGATGA 






TGTTGTTGAA 


CrCAGCACTA 






CGOTGTATCC 


A Sequence ID No. 10 


73K1437 


33 


TCCTGAAATG 


ACAGAACCGT 






TGAAGAGAAA 


GTT Sequence ID No. 11 


73K1471 


4S 


AnrCCACTG 


CTGTTGAGTC 






TAACXjAGGTC 


TCCACXrAGTG 






AATCCTGG 


Sequence ID No. 12 


73K1561 


60 


GTGAATAGGG 


GTCACAGAAG 






CATACCTCAC 


ACGAACTCTA 






TATCTGGTAG 


ATGirGGArGG 








Sequence ID No. 13 


73K1642 


33 


TGTACCTCGA 


ACrGTATTGG 






AGAAGATGGA 


TGA Sequence ID No. 14 


73K1675 


48 


TTCAAAGTAA 


CCGAAATCGC 






TGGATTGGAG 


ATTATCCAAG 






GAGGTAGC 


Sequence ID No. 15 


73K1741 


39 


ACTAAAGTTT 


CrAACACCX:A 






CGArGTTACC 


GAGTGAAGA 


73K1797 


36 


AACTGGAATG 


AACrCGAATC 






TGTCGATAAT 


CACTCC 








Sequence ID No. 16 


73KTERM 


54 


GGACACTAGA 


TCTTAGTGAT 






AATCGGTCAC 


AnTGTCTTG 






AGTCCAAGCr 


CGTT 



25 



30 



35 



40 



45 



50 



The resulting gene has two potential polyadenyiation sites 
(compared to 18 in the WT) and no AiTT A sequence (12 in 
the WT). The G+C content has increased from 37% to 48%. 
A total of 59. individual base pair changes were made using 55 
the primers in Table VI . Overall, there is 90% DNA homol- 
ogy between the region of the HD-73 gene modified by site 
directed mutagenesis and the wild-type sequence of the 
analogous region of HD-73. The synthetic HD-73 is a hybrid 
of the first 1360 bases from the synthetic HD-1 and the next 60 
590 bases or so modified HD-73 sequence. FIG. 4 is a 
comparison of the above-described synthetic B.t.k. HD-73 
and the wild-type B.t.k. HD-73 encoding amino acids 1—645. 
In the modified region of the HD-73 gene 44 of the 170 
codons (25%) were changed as a result of the site-directed 65 
mutagenesis changes resulting from the oligonucleotides 
found in Table VI. Overall, approximately 50% of the 



Codon Usaee in 


Synthetic B.t.k 


. HD-73 Gene 


n 


Amino Acid 


Codon 


Percent Usage in 
Plants/Wt HD-73/Sy 


ARG 


CGA 


7 


10 




(10) 


CGC 


11 


0 


3 




CGG 


e 




Q 




CGU 


25 


20 


23 




AGA 


29 


60 


62 




AGG 


23 


0 


3 


LEU 


CUA 


8 


25 


3 


(12) 


cue 


20 


1 1 

X f 


58 




CUG 


10 


17 


3 




CUU 




g 


Q 




UUA 




33 


3 




UUG 


30 


Q 


17 


SER 


UCA 


14 


24 


IS 


(21) 


UCC 


26 


10 


27 


UCG 


-1 


1 n 


0 




UCU 






IS 




AGC 


21 


0 


14 




AGU 


-ic 
1j 


'X'X 


23 


THR 


ACA 




Al 
^ ( 


38 


(15) 


ACC 


41 


3 


31 




ACG 


7 


13 


Q 




ACU 


31 


27 


31 


PRO 


CCA 


45 


71 


71 


U) 


CCC 


19 


0 


Q 




CCG 


9 


4 


0 




ecu 


26 


14 


29 


ALA 


GCA 


23 


29 


31 


(14) 


GCC 


32 


7 


8 




GCG 


3 


21 


15 




GCU 


41 


43 


46 


GLY 


GGA 


32 


33 


43 


(15) 


GGC 


20 


0 


0 




GGG 


11 


27 


14 




GGU 


37 


40 


43 


ILE 


AUA 


12 


33 


7 


(15) 


AUC 


45 


7 


40 




AUU 


43 


60 


53 


VAL 


GUA 


9 


40 


7 


(15) 


GUC 


20 


0 


7 




GUG 


28 


20 


36 




GUU 


43 


40 


50 


LYS 


AAA 


36 


67 


100 


(3) 


AAG 


64 


33 


0 


ASN 


AAC 


72 


20 


53 


(20) 


AAU 


23 


80 


47 


GLN 


CAA 


64 


60 


67 


(5) 


GAG 


36 


40 


33 


HIS 


CAC 


65 


67 


100 


(3) 


CAU 


35 


33 


0 


GLU 


GAA 


48 


86 


57 


(7) 


GAG 


52 


14 


43 


ASP 


GAC 


48 


40 


50 


(5) 


GAU 


52 


60 


50 


TYR 


UAC 


68 


0 


20 


(5) 


UAU 


32 


100 


80 


CYS 


UGC 


78 


0 


0 


(0) 


UGU 


22 


0 


0 
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TABLE Vll-continued 

Codon Usage in Synthetic B.t.k. HI>73 Gene 



Percent Usage in 
Amino Acid Codon Plants/Wt HI>73/Syn 



PHE 


UUC 


56 


8 


67 


(13) 


UUU 


44 


92 


33 


MET 


AUG 


100 


100 


100 


(2) 










TRP 


UGG 


100 


100 


100 



(2) 



Another truncated synthetic HD-73 gene was constructed. 
The sequence of this synthetic HD-73 gene is identical to 
that of the above synthetic HD-73 gene in the region in 
which they overlap (amino acids 29—615), and it also 
encodes Met -Ala at the N-terminus. FIG. 8 shows a com- 
parison of this truncated synthetic HD-73 gene with the 
N-terminal Met-Ala versus the wild-type HD-73 gene. 

While the previous examples have been directed at the 
preparation of synthetic and modified genes encoding trun- 
cated B.t.k, proteins, synthetic or modified genes can also be 
prepared which encode full length toxin proteins. 

One full length B.t.k. gene consists of the synthetic HD-73 
sequence of FIG. 4 from nucleotide 1-1845 plus wild- type 
HD-73 sequence encoding amino acids 616 to the 
C-terminus of the native protein. FIG. 9 shows a comparison 
of this synthetic/wiid-type full length HD-73 gene versus the 
wild-type full length HD-73 gene. 

Another full length B.t.k. gene consists of the synthetic 
HD-73 sequence of FIG. 4 from nucleotide 1—1845 plus a 
modified HD-73 sequence ending amino acids 616 to the 
C-terminus of the native protein. The C- terminal portion has 
been modified by site-directed mutagenesis to remove puta- 
tive pKDlyadenylation signals and ATTTA sequences accord- 
ing to the algorithm of FIG. 1. FIG. 10 shows a comparison 
of this synthetic/modified full length HD-73 gene versus the 
wild-type full length HD-73 gene. 

Another full length B.t.k. gene consists of a fuUy synthetic 
HD-73 sequence which incorporates the synthetic HD-73 
sequence of FIG. 4 from nucleotide 1—1845 plus a synthetic 
sequence encoding amino acids 616 to the C-terminus of the 
native protein. The C-terminal synthetic portion has been 
designed to eliminate putative polyadenylation signals and 
ATTTA sequences and to include plant preferred codons. 
FIG. 11 shows a comparison of this fully synthetic full 
length HD-73 gene versus the wild-type full length HD-73 
gene. 

Alternatively, another full length B.t.k. gene consists of a 
fully synthetic sequence comprising base pairs 1-1830 of 
B.t.k. HD-1 (FIG. 3) and base pairs 1834-3534 of B.t.k. 
HD-73 (FIG. 11) (SEQ ID NO:27). 

EXAMPLE 4 

Expression of Modified and Synthetic B.t.k, HD-1 
and Synthetic HD-73 

A number of plant transformation vectors for the expres- 
sion of B.t.k. genes were constructed by incorporating the 
structural coding sequences of the previously described 
genes Into plant transformation cassette vector pMON893. 
The respective intermediate transformation vector is nserted 
into a suitable disarmed Agrobacterium vector such as A. 
tumefaciens ACO, supra. Tissue cxplants are cocultured 
with the disarmed Agrobacterium vector and plants regen- 
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erated under selection for kanamyc n resistance using known 
protocols: tobacco (Horsch et al., 1985); tomato 
(McCormick et al., 1986) and cotton (Trolinder et al., 1987). 
a) Tobacco. 

The level of B.t.k. HD-1 protein in transgenic tobacco 
plants containing pMON9921 (wild type truncated), 
pMON5370 (modified HD-1, Example 1, FIG. 2) and 
pMON5377 (synthetic HD-1, Example 2, FIG. 3) were 
analyzed by Western analysis. Leaf tissue was firozen in 
liquid nitrogen, ground to a fine powder and then ground in 
a 1:2 (wt:volume) of SDS-PAGE sample buffer. Samples 
were frozen on dry ice, then incubated for 10 minutes in a 
boiling water bath and micro fuged for 10 minutes. The 
protein concentration of the supernatant was determined by 
the method of Bradford (Anal. Biochem. 72:248-254). Fifty 

* ug of protein was run per lane on 9% SDS-PAGE gels, the 
protein transferred to nitrocellulose and the B.t.k. HD-1 
protein visualized using antibodies produced against B.t.k. 
HD-1 protein as the primary antibody and alkaline phos- 
phatase conjugated second antibody as described by the 

* manufacturer (Pro mega, Madison, Wis.) Purified HD-1 tryp- 
tic fragment was used as the control. Whereas the B.t,k. 
protein from tobacco plants containing pMON9921 was 
below the level of detection, the B.t.k. protein fi-om plants 
containing the modified (pMON5370) and synthetic 

' (pMON5377) genes was easily detected. The B.t.k. protein 
from plants containing pMON9921 remained undetectable, 
even with 10 sold longer incubation times. The relative 
levels of B.t.k. HD-1 protein in these plants is estimated in 
Table VIII. Because the protein from plants containing 

* pMON9921 was not observed, the level of protein in these 
plants was estimated from the relative mRNA levels (see 
below). Plants containing the modified gene (pMON5370) 
expressed approximately 100 fold more B.t.k. protein than 
plants containing the wild-type gene (pMON9921). Plants 

' containing the fully synthetic B.t.k. HD-1 gene 
(pMON5377) expressed approximately five fold more pro- 
tein than plants containing the modified gene. The modified 
gene contributes the majority of the increase va B.t.k. expres- 
sion observed. The plants used to generate the above data are 

* the best representatives from each construct based either on 
a tobacco hornworm bioassay or on data derived from 
previous Western analysis. 

TABLE VIII 



50 



Expression of B.Lk. HD-1 Protein in Transgenic Tbbacco 








Fold Increase 


Gene 




B.Ulc Protein* 


in B.Lk. 


Description 


Vector 


Concentration 


Expression 


Wild type 


pMON9921 


10 


1 


Modified 


pMON5370 


1000 


100 


Synthetic 


pMON5377 


5000 


500 



*B.t.k. protein concentrations are expressed in ng/mg of total soluble protein. 
The level of B.t.k. protein for plants containing the wild type gene are 
estimated from mRNA levels. 

Plants containing these genes were tested for bioactivity 
to determine whether the increased quantities of protein 
observed by Western analysis result in a corresponding 

60 increase in bioactivity. Leaves from the same plants used for 
the Western data in Table 1 were tested for bioactivity 
against two insects. Adetacned leaf bioassay was first done 
using tobacco hornworm, an extremely sensitive lepi- 
dopteran insect. Leaves from all three transgenic tobacco 

65 plants were totally protected and 100% mortality of tobacco 
hornworm observed (see Table IX below). A much less 
sensitive insect, beet armyworm, was then used in another 
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detached leaf bioassay. Beet armyworm is approximately 
500 fold less sensitive to B.t.k. HD-1 protein than tobacco 
homworm. The difference in sensitivity of these two insects 
was determined using purified HD-1 protein in a diet incor- 
poration assay (see below). Plants containing the wild-type 
gene (pMON9921) showed only minimal protection against 
beet armyworm, whereas plants containing the modified 
gene showed almost complete protection and plants con- 
taining the fully synthetic gene were totally protected 
against beet armyworm damage. The results of these bioas- 
says confirm the levels ot B.t.k. HD-1 expression observed 
in the Western analysis and demonstrates that the increased 
levels oi B.t.k. HD-l protein correlates with increased insec- 
ticidal activity. 

TABLE IX 
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Gene 

Description 


Protection of Tobacco Plants from 
Tobacco Hornworm and Beet Armvworm 


Vector 


Tobacco Hornworm 
IDamage* 


Beet Armyworm 
Damage* 


None 


None 


NL 


NL 


Wild type 


pMON9921 


0 


3 


Modified 


pMON5370 


0 


1 


Synthetic 


pMON5377 


0 


0 



■^Extent of insect damage was rated: 0, no damage; 1, slight; 2, moderate; 3, 
severe; or NL, no leaf left. 

The bioactivity of the B.t.k. HD-1 protein produced by 
these transgenic plants was further investigated to more 
accurately quantitate the relative activities. Leaf tissue from 
tobacco plants containing the wild-type, modified and syn- 
thetic genes were ground in 100 mM sodium carbonate 
buffer, pH 10 at a 1:2 (wt:vol) ratio. Particulate material was 
removed by centrifiigation. Tlie supernatant was incorpo- 
rated into a synthetic diet similar to that described by 
Marrone et al. (1985). The diet medium was prepared the 
day of the test with the plant extract solutions incorporated 
in place of the 20% water component. One ml of the diet was 
aliquoted into 96 well plates. 

After the diet dried, one neonate tobacco budworm larva 
was added to each well. Sixteen insects were tested with 
each plant sample. The plants were incubated at 27** C. After 
seven days, the larvae firom each treatment were combined 
and weighed on an analytical balance. The average weight 
per insect was calculated and compared to a standard curve 
relating B.t.k. protein concentrations to average larval 
weight. Insect weight was inversely proportional (in a loga- 
rithmic manner) to the relative increase in B.t.k. protein 
concentration. The amount of B.t.k. HD-1 protein, based on 
the extent of larval growth inhibition was determined for 
two different plants containing each of the three genes. The 
specific activity (ng of B.t.k. HD-1 per mg of plant protein) 
was determined for each plant. Plants containing the modi- 
fied HD-1 gene (pMON5370) averaged approximately 1400 
ng (1200 and 1600 ng) of B.t.k. HD-1 per mg of plant extract 
protein. This value compares closely with the 1000 ng of 
B.t.k. HD-1 protein per mg of plant extract protein as 
determined by Western analysis (Table I). B.t.k. HD-1 con- 
centrations for the plants containing the synthetic HD-1 gene 
averaged approximately 8200 ng (7200 and 9200 ng) of 
B.t.k. HD-1 protein per mg of plant extract protein. This 
number compares well to the 5000 ng of HD-1 protein per 
mg of plant extract protein estimated by Western analysis. 
Likewise, plants containing the synthetic gene showed 
approximately a six-fold higher specific activity than the 
corresponding plants containing the modified gene for these 
bioassays. In the Western analysis the ratio was approxi- 
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mately 10 fold, again both are in good agreement. The level 
of B.t.k. protein in plants containing the wild-type HD-1 
gene (pMON9921) was too low to give a significant 
decrease in larval weight and hence was below a level that 
could be quantitated in this assay. In conclusion, the levels 
of B.t.k. HD-1 protein determined by both the bioassays and 
the Western analysis for these plants containing the modified 
and synthetic genes agree, which demonstrates that the B.t.k. 
HD-1 protein produced by these plants is biologically active. 

The levels of mRNA were determined in the plants 
containing the wild-type B.t.k. HD-1 gene (pMON9921) and 
the modified gene (pMON5370) to establish whether the 
increased levels of protein production result from increased 
transcription or translation. mRNA from plants containing 
the synthetic gene could not be analyzed directly with the 
same DNA probe as used for the wild-type and modified 
genes because of the numerous changes made in the coding 
sequence. mRNA was isolated and hybridized with a sinaoe- 
stranded DNA probe homologous to approximately the 5' 90 
bp of the wild-type or modified gene coding sequences. The 
hybrids were digested with SI nuclease and the protected 
probe fragments analyzed by gel electrophoresis. Because 
the procedure used a large excess of probe and long hybrid- 
ization time, the amount of protected probe is proportional 
to the amoimt of B.t.k. mRNA present in the sample. Two 
plants expressing the modified gene (pMON5370) were 
found to produce up to ten-fold more RNA than a plant 
expressing the wild-type gene (pMON9921). 

The increased mRNA level from the modified gene is 
consistent with the result expected from the modifications 
introduced into this gene. However, this 10 fold increase in 
mRNA with the modified gene compared to the wild- type 
gene is in contrast to the 100 fold increase in B.t.k. protein 
from these genes in tobacco plants. If the two mRNAs were 
equally well translated then a 10 fold increase in stable 
mRNA would be expected to yield a 10 fold increase in 
protein. The higher increase in protein indicates that the 
modified gene mRNA is translated at about a fold higher 
efiGciency than wild-type. Thus, about half of the total effect 
on gene expression can be explained by changes in mRNA 
levels and about half to changes in translational efficiency. 
This increase in translational efficiency is striking in that 
only about 9.5% of the codons have been changed in the 
modified gene; that is, this effect is clearly not due to 
wholesale codon usage changes. The increased translational 
efficiency could be due to changes in mRNA secondary 
structure that affect translation or to the removal of specific 
translational blockades due to specific codons that were 
changed. 

The increased expression seen with the synthetic HD-1 
gene was also seen with a synthetic HD-73 gene in tobacco. 
B.t.k. HD-73 was undetected in extracts of tobacco plants 
containing the wild-type truncated HD-73 gene 
(pMON5367), whereas B.t.k. HD-73 protein was easily 
detected in extracts from tobacco plants containing the 
synthetic HD-73 gene of FIG. 4 (pMON5383). Approxi- 
mately 1000 ng of B.t.k. HD-73 protein was detected per mg 
of total soluble plant protein. 

As described in Example 3 above, the B.t.k. HD-73 
protein encoded in pMON5383 contains a small C-terminal 
extension of amino acids not encoded in the wild- type 
HD-73 protein. These extra amino acids had no effect on 
insect toxicity or on increased plant expression. A second 
synthetic HD-73 gene was constructed as described in 
Example 3 (FIG. 8) and used to transform tobacco 
(pMON5390). Analysis of plants containing pMON5390 
showed that this gene was expressed at levels comparable to 
that of pMON5383 and that these plants had similar insec- 
ticidal efficacy. 
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In tobacco plants the synthetic HD-1 gene was expressed 
at approximately a 5-fold higher level than the synthetic 
HD-73 gene. However, this synthetic HD-73 gene stiU was 
expressed at least 100-fold better than the wild- type HD-73 
gene. The HD-73 protein is approximately 5-foId more toxic 5 
to many insect pests than the HD-1 protein, so both synthetic 
HD-1 and HD-73 genes provide approximately comparable 
insecticidal efiGcacy in tobacco. 

The full length B.t.k. HD-73 genes described in Example 
3 were also incorporated into the plant transformation vector lO 
pMON893 so that they were expressed from the En 35S 
promoter. The synthetic/wild-type full length HD-73 gene of 
FIG, 9 was incorporated into pMON893 to create 
pMON10505. The synthetic/modified full length HD-73 
gene of FIG. 10 was incorporated into pMON893 to create is 
pMON10526. The fully synthetic HD-73 gene of FIG. U 
was incorporated into pMON893 to create pMON10518. 
These vectors were used to obtain transformed tobacco 
plants, and the plants were analyzed for insecticidal eflficacy 
and for B.t,k, HD-73 protein levels by Western blot or 20 
ELISA immunoassay. 

Tobacco plants containing all three of these full length 
B.t.k, genes produced detectable B.t.k. protein and showed 
100% mortality of tobacco horn worm. This result is surpris- 
ing in light of previous reported attempts to express the full 25 
length B.t.k. genes in transgenic plants. Vaeck et al. (1987) 
reported that a full length B.t.k. ber liner gene similar to our 
HD-1 gene could not be detectably expressed in tobacco- 
Barton et al. (1987) reported a similar result for another full 
length gene from B.t.k. HD-1 (the so called 4.5 kb gene), and 30 
further indicated that tobacco callus containing this gene 
became necrotic, indicating that the full length gene product 
was toxic to plant cells. Fischhoff et aL (1987) reported that 
the full length B.t.k. HD-1 gene in tomato was poorly 
expressed compared to a truncated gene, and no plants that 35 
were fully toxic to tobacco horn worm could be recovered. 
All three of the above reports indicated much higher expres- 
sion levels and recovery of toxic plants if the respective 
B.t.k. genes were truncated. Adang et al. reported that the 
full length HD-73 gene yielded a few tobacco plants with 40 
some biological activity (none were highly toxic) against 
horaworm and barely detectable B.t.k. protein. It was also 
noted by them that the major B.t.k. mRNA in these plants 
was a truncated 1.7 kb species that would not encode a 
functional toxin. This indicated improper expression of the 45 
gene in tobacco. In contrast to all of these reports, the three 
full length B.t.k. HD-73 genes described above all lead to 
relatively high levels of protein and high levels of insect 
toxicity. 

B.t.k. zrotein and mRNA levels in tobacco plants are 50 
shown in Table X for these three vectors. As can be seen 
from the table, the syntheticAvild-type gene (pMON 10506) 
produces BjM. protein as about 0.01% of total soluble 
protein; the synthetic/modified gene produces B.r./:. as about 
0.02% of total soluble protein; and the fully synthetic gene 55 
produces B.t.k, as about 0.2% of total soluble protein. B.t.k. 
mRNA was analyzed in these plants by Northern blot 
analysis using the common 5' synthetic half of the genes as 
a probe. As shown in Table X, the increased protein levels 
can Largely be attributed to increased mRNA levels. Com- 60 
pared to the truncated modified and synthetic cenes, this 
could indicate that the major contributors to increased 
translational eflSciency are in the 5* half of the gene while the 
3* half of the gene contains mostly determinants of mRNA 
stability. The increased protein levels also indicate that 65 
increasing the amount of the full length gene that is synthetic 
or modified increases B.t.k. protein levels. Compared to the 



truncated synthetic BJ.k. HD-73 genes (pMON5383 or 
pMON5390), the fuUy synthetic aene {pMON105l8) pro- 
duces as much or slightly more B.t.k. protein demonstrating 
that the full length genes are capable of being expressed at 
high levels in plants. These tobacco plants with high levels 
of full length HD-73 protein show no evidence of abnor- 
mality and are fully fertile. Th& B.t.k. protein levels in these 
plants also produce the expected levels of insect toxicity 
based on feeding studies with beet armyworm or diet 
incorporation assays of plant extracts with tobacco bud- 
worm. The B.t.k. protein detected by Western blot analysis 
in these tobacco plants often contains a varying amount of 
protein of about 80 kDa which is apparently a proteolytic 
fragment of the fuU length protein. Hie C-terminal half of 
the full length protein is known to be proteolytically 
sensitive, and similar proteolytic fragments are seen from 
the fiill length gene in E. coli and B.t. itself. These fragments 
are fully insecticidal. The Northern analysis indicated that 
essentially all of the mRNA from these full length genes was 
of the expected full length size. There is no evidence of 
truncated mRNAs that could give rise to the 80 kDa protein 
fragment. In addition, it is possible that the fragment Is not 
present in intact plant cells and is merely due to proteolysis 
during extraction for immunoassay. 

TABLE X 



Full Leneth B.tk. HD-73 Protein and 
mRNA Levels in Transgenic Tobacco Plants 


Gene 

description 


Vector 


B.t.k. protein 
concentration 


Relative B.t.k. 
mRNA level 


Synthetic/wild type 


pMON10506 


>100 


0.5 


Synthetic/mod ified 


pMON10526 


400 


1 


Fully synthetic 


pMON10518 


>2000 


40 



Thus, there is no serious impediment to producing high 
levels of B.t.k. HD-73 protein in plants from synthetic genes, 
and this is expected to be true of other full length lepi- 
dopteran active genes such as B.t.k. HD-1 ot B.t, entomoci- 
dus. The fully synthetic B.t.k. HD-1 gene of Example 3 has 
been assembled in plant transformation vectors such as 
pMON893. 

The fully synthetic gene in pMON10518 was also utilized 
in another plant vector and analyzed in tobacco plants. 
Although the CaMV35S promoter is generally a high level 
constitutive promoter in most plant tissues, the expression 
level of genes driven the CaMV35S promoter is low in floral 
tissue relative to the levels seen in leaf tissue. Because the 
economically important targets damaged by some insects are 
the floral parts or derived from floral parts (e.g., cotton 
squares and bolls, tobacco buds, tomato buds and firuit), it 
may be advantageous to increase the expression of B.t. 
protein in these tissues over that obtained with the 
CaMV35S promoter. 

The 35S promoter of Figwort Mosaic Virus (FMV) is 
analogous to the CaMV35S promoter. This promoter has 
been isolated and engineered into a plant transformation 
vector analogous to pMON893. Relative to the CaMV 
promoter, the FMV 35S promoter is highly expressed in the 
floral tissue, while still providing similar high levels of gene 
expression in other tissues such as leaf. A plant transforma- 
tion vector, pMON10517, was constructed in which the full 
length synthetic B.t.k. HD-73 gene of FIG. 11 was driven by 
the FMV 35S promoter. This vector is identical to 
pMON10518 of Example 3 except that the FMV promoter 
is substituted for the CaMV promoter. Tobacco plants trans- 
formed with pMON10517 and pMON10518 were obtained 
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and compared for expression of the BJ.k. protein by Western 
blot or ELISA immunoassay in leaf and floral tissue. This 
analysis showed that pMON10517 containing the FMV 
promoter expressed the full length HD-73 protein at higher 
levels in floral tissue than pMC105l8 containing the CaMV 
promoter. Expression of the full length B.t.k. HD-73 protein 
from pMON10517 in leaf tissue is comparable to that seen 
with the most highly expressing plants containing 
pMON10518. However, when floral tissue was analyzed, 
tobacco plants containing pMON10518 that had high levels 
of B.t.k. protein in leaf tissue did not have detectable B.t.k. 
protein in the flowers. On the other band, flowers of tobacco 
plants containing pMON10517 had levels of B.t.k. protein 
nearly as high as the levels in leaves at approximately 0.05% 
of total soluble protein. This analysis showed that the FMV 
promoter could be used to produce relatively high levels of 
B.t.k. protein in floral tissue compared to the CaMV pro- 
moter, 
b) Tomato. 

The wild-type, modified and synthetic ^.r. A:. HD-1 genes 
tested in tobacco were introduced into other plants to 
demonstrate the broad utility of this invention. Transgenic 
tomatoes were produced which contain these three genes. 
Data show that the increased expression observed with the 
modified and synthetic gene in tobacco also extends to 
tomato. Whereas the B.t.k. HD-1 protein is only barely 
detectable in plants containing the wild type HD-1 gene 
(pMON9921), B.t.k. HD-1 was readily detected and the 
levels determined for plants containing the modified 
(pMON5370) or synthetic (pMON5377) genes. Expression 
levels for the plants containing the wild-type, modified and 
synthetic HD-1 genes were approximately 10, 100 and 500 
ng per mg of total plant extract see Table XI below). The 
Increase in B.t.k. HD-1 protein for the modified gene 
accounted for the majority or increase observed; 10 fold 
higher than the plants containing the wild-type gene, com- 
pared to only an additional five-fold increase for plants 
containing the synthetic gene. Again the site-directed 
changes made in the modified gene are the mdajor contribu- 
tors to the increased expression of B.t.k, HD-1. 

TABLE XI 



B.t.k. HD-1 Expressioa in Transgenic Tomato Plants 








Fold Increase 


Gene 




B.Lk. Protein* 


in B.t.k. 


Description Vector 


Concentration 


Expression 


Wild type 


pMON9921 


10 


1 


Modified 


pMON5370 


100 


10 


Synthetic 


pMON5377 


500 


50 



10 



15 



20 



25 



40 



*B.t.k. HD-1 protein concentrations are expressed in ng/mg of total soluble 
plant protein. Data for plants containing the wild- type gene are estimates from 
mRNA levels and protein levels determined by ELISA. 

These differences in B.t.k. HD-1 expression were con- 
firmed with bioassays against tobacco homworm and beet 
armywonn. Leaves from tomato plants containing each of 
these genes controlled tobacco hornwonn damage and pro- 
duced 100% mortality.. With beet armywonn, leaves from 
plants containing the wild-type HD-1 gene (pMON9921) 
showed significant damage, leaves from plants containing 
the modified gene (pMON5370) showed less damage and 
leaves from plants containing the synthetic aene 
(pMON5377) were completely protected (see Table XII 
below). 
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TABLE XII 



Gene 

Description 


Protection of Tomato Plants from 
Tobacco Hornworm and Beet Armyworm 


Vfector 


Tobacco Hornworm Beet Armyworm 
Damage"* Damage- 


None 


None 


NL NL 


Wild type 


pMON9921 


0 3 


Modified 


pMON5370 


0 1 


Synthetic 


pMON5377 


0 0 



•Damage was rated as shown in Table DC 

The generality of the synthetic gene approach was 
extended in tomato with a synthetic B.t.k. HD-73 gene. 

In tomato, extracts from plants containing the wild- type 
truncated HD-73 gene (pMON5367) showed no detectable 
HD-73 protein. Extracts from plants containing the synthetic 
HD-73 gene (pMON5383) showed high levels of B.t.k. 
HD-73 protein, approximately 2000 ng p>er mg of plant 
extract protein. These data clearly demonstrate that the 
changes made in the synthetic HD-73 gene lead to dramatic 
increases in the expression of the HD-73 protein in tomato 
as well as in tobacco 

In contrast to tobacco, the synthetic HD-73 gene in tomato 
is expressed at approximately 4-fold to 5-fold higher levels 
than the synthetic HD-1 gene. Because the HD-73 protein is 
about 5-fold more active than the HD-1 protein against 
many insect pests including Heliothis species, the increased 
expression of synthetic HD-73 compared to synthetic HD-1 
corresponds to about a 25-fold increased insecticidal eflB- 
cacy in tomato. 

In order to determine the mechanisms involved in the 
increased expression of modified and synthetic B.t.k. HD-1 
genes in tomato, SI nuclease analysis of mRNA levels from 
transformed tomato plants was performed. As indicated 
above, a similar analysis had been performed with tobacco 
plants, and this analysis showed that the modified gene 
produced up to 10-fold more mRNA than the wild-type 
gene. The analysis in tomato utilized a different DNA probe 
that allowed the analysis of wild-type (pMON9921), modi- 
fied (pMON5370) and synthetic (pMON5377) HD-1 genes 
with the same probe. This probe was derived from the 5' 
untrahslated region of the CaMV35S promoter in pMON893 
that was common to all three of these vectors (pMON9921, 
pMON5370 and pMON5377). This SI analysis indicated 
that B.t.k. mRNA levels from the modified gene were 3 to 5 
fold higher than for the wild-type gene, and that mRNA 
levels for the synthetic gene were about 2 to 3 fold higher 
than for the modified gene. Three independent transformants 
were analyzed for each gene. Compared to the fold increases 
in B.t.k. HD-1 protein from these genes in tomato shown in 
Table XI, these mRNA increases can explain about half of 
the total protein increase as was seen in tobacco for the 
wild-type and modified genes. For tomato the total mRNA 
increase from wild-type to synthetic is about 6 to 15 fold 
compared to a protein increase of about 50 fold. This result 
is similar to that seen for tobacco in comparing the wild-type 
and modified genes, and it extends to the synthetic gene as 
weU. That is, about haff of the total fold increase in B.t.k. 
protein from wild-type to modified genes can be explained 
by mRNA increases and about haff to enhanced translational 
efficiency. The same is also true in comparing the modified 
gene to the synthetic gene. Although there is an additional 
increase in RNA levels, this mRNA increase can explain 
only about haff of the total protein increase. 

The full length B.t.k. genes described above were also 
used to transform tomato plants and these plants were 
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TABLE Xin 



Full Length B.t.k. HD-73 Protein and 
mRNA Levels in Transeenic Tomato Plants 


Gene 

description 


Vector 


B.t.k. protein 
concentration 


Relative B.Lk. 
mRNA level 


SyntheticAvild type 


pMON10506 


100 


1 


Synthetic/modified 


pMON10526 


400 


2-^ 


Fully synthetic 


pMONlOSlS 


2000 


10 



10 



analyzed for B.t.k. protein and insecticidal efiBcacy. The 
results of this analysis are shown in Table XIII. Plants 
containing the synthetic/wild-type gene (pMON10506) pro- 
duce the B.t.k. HD-73 protein at levels of about 0.01% of 
their total soluble protein. 

Plants containing the synthetic/modified gene 
(pMON10526) produce about 0.04% B.t.k. protein, and 
plants containing the fully synthetic gene (pMON10518) 
produce about 0.2% B.t.k. protein. These results are very 
similar to the tobacco plant results for the same genes. 
mRNA levels estimated by Northern blot analysis in tomato 
also increase in parallel with the protein level increase. As 
for tobacco with these three genes, most of the protein 
increase can be attributed to increased mRNA with a small 
component of translational efficiency increase indicated for 15 
the fully synthetic gene. The highest levels of fiill length 
B.t.k. protein (from pMON10158) are comparable to or just 
slightly lower than the highest levels observed for the 
tmncated HD-73 genes (pMON5383 and pMON5390). 
Tomato plants expressing these full length genes have the 
insecticidal activity expected for the observed protein levels 
as determined by feeding assays with beet armyworm or by 
diet incorporation of plant extracts with tobacco homworm. 
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c) Cotton. 

The generality of the increased expression of B.t.k. HD-1 
and B.t.k, HD-73 by use of the modified and synthetic genes 
was extended to cotton. Transgenic calli were produced 
which contain the wild type (pMON9921) and the synthetic 
HD-1 (pMON5377) genes. Here again the BJ.k. HD-1 
protein produced from calLi containing the wild-type gene 
was not detected, whereas caUi containing the synthetic 
HD-1 gene expressed the HD-1 protein at easily detectable 45 
levels. The HD-1 protein was produced at approximately 
1000 ng/mg of plant calli extract protein. Again, to ensure 
that the protein produced by the transgenic cotton calli was 
biologically active and that the increased expression 
observed with the synthetic gene translated to increased 
biological activity, extracts of cotton calli were made in 
similar manner as described for tobacco plants, except that 
the calli was first dried between Whatman filter paper to 
remove as much of the water as possible. The dried calli 
were then ground in liquid nitrogen and ground in 100 mM 
sodium carbonate buffer, pH 10. Approximately 0.5 ml 
aUquotes of this material was applied to tomato leaves with 
a paint brush. After the leaf dried, five tobacco homworm 
larvae were applied to each of two leaf samples. Leaves 
painted with extract from control calli were completely 
destroyed. Leaves painted with extract from calli containing 
the wild-tvpe HD-1 gene (pMON9921) showed severe dam- 
age. Leaves painted with extract from caUi containing the 
synthetic HD-1 gene {pMONS5377) showed no damage (see 
Table XIV below). 
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TABLE XIV 



Protection against Tobacco Hornworm by Tomato Leaves 

Painted with Extracts Prepared from Cbtton CaMi 
Cbntaining a Control, the WiJd-jype B.t.k. HD-1 Gene, 
Synthetic HD-1 Gene or Synthetic HD-73 Gene 



Gene 

Description 



Vector 



Tobacco Homworm 
Damage* 



Control 

Wild type HD-1 
Synthetic HD-1 
Synthetic HD-73 



Control 
pMON9921 
pMON5377 
pMON5383 



NL 
3 
0 
0 



■Damage was rated as shown in Table VIIL 

Cotton calli were also produced containing another syn- 
thetic gene, a gene encoding B.t.k. HD-73. The preparation 
of this gene is described in Example 3. Cali containing the 
synthetic HD-73 gene produced the corresponding HD-73 
protein at even higher levels than the calli which contained 
the synthetic HD-1 gene. Extracts made from calli contain- 
ing the HD-73 synthetic gene (pMON5383) showed com- 
plete control of tobacco homworm when painted onto 
tomato leaves as described above for extracts containing the 
HD-1 protein. (See Table XIV). 

Transgenic cotton plants containing the synthetic B.t.k. 
HD-1 gene (pMON5377) or the synthetic ^.r. A:. HD-73 gene 
(pMON5383) have also been examined. These plants pro- 
duce the HD-1 or HD-73 proteins at levels comparable to 
that seen in cotton callus with the same genes and compa- 
rable to tomato and tobacco plants with these genes. For 
either synthetic truncated HD-1 or HD-73 genes, cotton 
plants expressing B.t.k. protein at 1000 to 2000 ng/mg total 
protein (0.1% to 0,2%) were recovered at a high frequency. 
Insect feeding assays were performed with leaves from 
cotton plants expressing the synthetic HD-1 or HD-73 
genes. These leaves showed no damage (rating of 0) when 
challenged with larvae of cabbage looper (Trichoplusia ni)^ 
and only slight damage when challenged with larvae of beet 
armyworm (Spodoptera exigua). Damage ratings are as 
defined in Table VIII above. This demonstrated that cotton 
plants as well as calli expressed the synthetic HD-1 or 
HD-73 genes at high levels and that those plants were 
protected from damage by Lepidopteran insect larvae. 

Transgenic cotton plants containing either the synthetic 
truncated HD-1 gene (pMON5377) or the synthetic trun- 
cated HD-73 gene (pMON5383) were also assessed for 
protection against cotton boUworm at the whole plant level 
in the greenhouse. This is a more realistic test of the ability 
of these plants to produce an agriculturally acceptable level 
of control. The cotton bollworm (Heliothis zed) is a major 
pest of cotton that produces economic damage by destroying 
terminals, squares and bolls, and protection of these fruiting 
bodies as well as the leaf tissue will be important for 
effective insect control and adequate crop protection. To test 
the protection afforded to whole plants, Rl progeny of 
cotton plants expressing high levels of either B.t.k. HD-1 
(pMON5377) or B.t.k. HD-73 (pMON5383) were assayed 
by applying 10-15 eggs of cotton bollworm per boll or 
square to the 20 uppermost squares or bolls on each plant. 
At least 12 plants were analyzed per treatment. The hatch 
rate of the eggs was approximately 70%. This corresponds 
to very high insect pressure compared to numbers of larvae 
per plant seen under typical field conditions. Under these 
conditions 100% of the bolls on control cotton plants were 
destroyed by insect damage. For the transgenics, significant 
boll protection was observed. Plants containing pMON5377 
(HD-1) had 70-75% of the bolls survive the intense pressure 
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of this assay. Plants containing pMON5383 (HD-73) had 
80% to 90% boll protection. This is likely to be a conse- 
quence of the higher activity of HD-73 protein against 
cotton boUworm compared to HD-1 protein. In cases where 
the transgenic plants were damaged by the insects, the 5 
surviving larvae were delayed in their development by at 
least one instar. 

Therefore, the increased expression obtained with the 
modified and synthetic genes is not limited to any one crop; 
tobacco, tomato and cotton calli and cotton plants all showed 10 
drastic increases in B.t.k. expression when the plants/calli 
were produced containing the modified or synthetic genes. 
Likewise, the utility of changes made to produce the modi- 
fied and synthetic BJ.k, HD-1 gene is not limited to the 
HD-1 gene. The synthetic HD-73 gene in all three species 15 
also showed drastic increases in expression. 

In summary, it has been demonstrated that: (1) the genetic 
changes made in the HD-1 modified gene lead to very 
significant increases in BJ,k, HD-1 expression; (2) produc- 
tion of a totally synthetic gene lead to a further five-fold 20 
increase in B.t.k. HD-1 expression; (3) the changes incor- 
porated into the modified HD-1 gene accounted for the 
majority of the increased B.t.k. expression observed with the 
synthetic gene; (4) the increased expression was demon- 
strated in three different plants — tobacco plants, tomato 25 
plants and cotton calli and cotton plants; (5) the increased 
expression as observed by Western analysis also correlated 
with similar increases in bioactivity, showing that the B.t.k. 
HD-1 proteins produced were comparably active; (6) when 
the method of the present invention used to design the 30 
synthetic HD-1 gene was employed to design a synthetic 
HD-73 gene it also was expressed at much higher levels in 
tobacco, tomato and mo cotton than the wild-type equivalent 
gene with consequent increases in bioactivity; (7) a fiiUy 
synthetic fiill length B.t.k. gene was expressed at levels 35 
comparable to synthetic truncated genes. 

EXAMPLE 5 

Synthetic B.t. tenebrionis Gene in Tobacco, Tomato 
and Potato 

Referring to FIG. 12, a synthetic gene encoding a 
Coleopteran active toxin is prepared by making the indicated 
changes in the wild -type gene of B.t. tenebrionis or de novo 
synthesis of the synthetic structural gene. The synthetic gene 45 
is inserted into an intermediate plant transformation vector 
such as pMON893: Plasmid pMON893 containing the syn- 
thetic B.t.t. gene is then inserted into a suitable disarmed 
Agrobacterium strain such as A. tumefaciens AGO. 

50 

Transformation and Regeneration of Potato 

Sterile shoot cultures of Russet Burbank are maintained in 
vials containing 10 ml of PM medium (Murashige and 
Skoog (MS) inorganic salts, 30 g/1 surcose, 0.17 g/1 
NaH2P04H20, 0.4 mg/1 thiamine-HCl, and 100 mg/1 myo- 55 
inositol, solidified with 1 g/1 Gelrite at pH 6.0). When shoots 
reached approximately 5 cm in length, stem intemode seg- 
ments of 7-10 mm are excised and smeared at the cut ends 
with a disarmed Agrobacterium tumefaciens vector contain- 
ing the synthetic B.t.t. gene from a four day old plate culture. 60 
The stem explants are co-cultured for three days at 23** C. on 
a sterile filter paper placed over 1.5 ml of a tobacco cell 
feeder layer overlaid on Vio P medium (Vio strength MS 
inorganic salts and organic addenda without casein as in 
Jarret et al. (1980), 30 g/1 surcose and 8.0 g/1 agar). Fol- 65 
lowing co-culture the explants are transferred to fiill strength 
P-1 medium for caUus induction, composed of MS inorganic 



,275 

34 

salts, organic additions £is in Jarret et al. (1980) with the 
exception of casein, 3.0 mg/1 benzyladenine (BA), and 0.01 
mg/1 naphthaleneacetic acid (NAA) (Jarret, et al., 1980). 
Carbenicilhn (500 mg/1) is included to inhibit bacterial 
growth, and 100 mg/1 kanamycin is added to select for 
transformed cells. After four weeks the explants are trans- 
ferred to medium of the same composition but with 0.3 mg/1 
gibberelhc acid (GA3) replacing the BA and NAA (Jarret et 
al., 1981) to promote shoot formation. Shoots begin to 
develop approximately two weeks after transfer to shoot 
induction medium; these are excised and transferred to vials 
of PM medium for rooting. Shoots are tested for kanamycin 
resistance conferred by the enzyme neomycin phosphotrans- 
ferase II, by placing a section of the stem onto callus 
induction medium containing MS organic and inorganic 
salts, 30 g/1 surcrose, 2.25 mg/1 BA, 0.186 mg/1 NAA, 10 
mg/1 GA3 (Webb, et al., 1983) and 200 mg/1 kanamycin to 
select for transformed cells. 

The synthetic 5. f.f. gene described in FIG. 12, was placed 
into a plant expression vector as descibed in example 5. The 
plasmid has the following characteristics; a synthetic Bglll 
fragment having approximately 1800 base pairs was inserted 
into pMON893 in such a manner that the enhanced 35S 
promoter would express the B.t.t. gene. This . construct, 
pMON1982, was used to transform both tobacco and 
tomato. Tobacco plants, selected as kanamycin resistant 
plants were screened with rabbit BxiXi-B.t.t. antibody. Cross- 
reactive material was detected at levels predicted to be 
suitable to cause mortality to CPB. These target insects will 
not feed on tobacco, but the transgenic tobacco plants do 
demonstrate that the synthetic gene does improve expression 
of this protein to detectable levels. 

Tomato plants with the pM ONI 982 construct were deter- 
mined to produce B.t.t. protein at levels insecticidal to CPB. 
In initial studies, the leaves of four plants (5190, 5225, 5328 
and 5133) showed little or no damage when exposed to CPB 
larvae (damage rating of 0—1 on a scale of 0 to 4 with 4 as 
no leaf remaining). Under these conditions the control 
leaves were completely eaten. Immunological analysis of 
these plants confirmed the presence of material cross- 
reactive with anti-5././. antibody. Levels of protein expres- 
sion in these plants were estimated at aproximately 1 to 5 ng 
of B.t.t. protein in 50 ug of total extractable protein. A total 
of 17 tomato plants (17 of 65 tested) have been identified 
which demonstrate protection of leaf tissue from CPB 
(rating of 0 or 1) and show good insect mortality. 

Results similar to those seen in tobacco and tomato with 
pM ONI 982 were seen with pM ONI 984 in the same plant 
species. pMON1984 is identical to pMON1982 except that 
the synthetic protease inhibitor (CMTI) is fused upstream of 
the native croteolvtic cleavage site. Levels of expression in 
tobacco were estimated to be similar to pMON1982, 
between 10-15 ng per 50ug of total soluble protein. 

Tomato plants expressing pMON1984 have been identi- 
fied which protect the leaves firom ingestion by CPB. The 
damage rating was 0 with 100% insect mortality. 

Potato was transformed as described in example 5 with a 
vector similar to pMON1982 containing the enhanced 
CaMV35S/synthetic B.t.t. gene. Leaves of potato plants 
transformed with this vector, were screened by CPB insect 
bioassay. Of the 35 plants tested, leaves from 4 plants, 16a, 
13c, 13d, and 23a were totally protected when challenged. 
Insect bioassays with leaves from three other plants, 13e, la, 
and 13b, recorded damage levels of 1 on a scale of 0 to 4 
with 4 being total devestation of the leaf material. Immu- 
nological analysis confirmed the presence of B.t.t. cross- 
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reactive material in the leaf tissue. The level oiB.U. protein 
in leaf tissue of plant 16a (damage rating of 0) was estimated 
at 20-50 ng of BJJ. protein/50 ug of total soluble protein. 
The levels of BJJ. protein seen in 16a tissue was consistent 
with its biological activity. Immunological analysis of 13e 
and 13b (tissue which scored 1 in damage rating) reveal less 
protein (5-10 ng/50 ug of total soluble protein) than in plant 
16a. Cuttings of plant 16a were challenged with 50 to 200 
eggs of CPB in a whole plant assay. Under these conditions 
16a showed no damage and 100% mortality of insects while 
control potato plants were heavily damaged. 

EXAMPLE 6 

Synthetic BJ.k. P2 Protein Gene 

The P2 protein is a distinct insecticidal protein produced 
by some strains of BJ. including BJ.k. HD-1. It is charac- 
terized by its activity against both lepidopteran and dipteran 
insects (Yamamoto and lizuka, 1983). Genes encoding the 
P2 protein have been isolated and characterized (Donovan et 
al., 1988). The P2 proteins encoded by these genes are 
approximately 600 amino acids in length. These proteins 
share only limited homology with the lepidopteran specific 
PI type proteins, such as Xh&BJ.k. HD-1 and HD-73 proteins 
described in previous examples. 

The P2 proteins have substantial activity against a variety 
of lepidopteran larvae including cabbage looper, tobacco 
homworm and tobacco budworm. Because they are active 
against agronomicsdly important insect pests, the P2 proteins 
are a desirable candidate in the production of insect tolerant 
transgenic plants either alone or in combination with the 
other Bj. toxins described in the above examples. In some 
plants, expression of the P2 protein alone might be sufficient 
to provide protection against damaging insects. In addition, 
the P2 proteins might provide protection against agronomi- 
cally important dipteran pests. In other cases, expression of 
P2 together with the BJ.k. HD-1 or HD-73 protein might be 
preferred. The P2 proteins should provide at least an additive 
level of insecticidal activity when combined with the crystal 
protein toxin of B.t.k. HD-1 or HD-73, and the combination 
may even provide a synergistic activity. Although the mode 
of action of the P2 protein is unknown, its distinct amino 
acid sequence suggests that it functions differently from the 
BJ.k. HD-1 and HD-73 type of proteins. Production of two 
insect tolerance proteins with different modes of action in 
the same plant would minimize the potential for develop- 
ment of insect resistance to BJ. proteins in plants. The lack 
of substantial DNA homology between P2 genes and the 
HD-1 and HD-73 genes minimizes the potential for recom- 
bination between multiple insect tolerance genes in the plant 
chromosome. The genes encoding the P2 protein although 
distinct in sequence from the B.t.k. HD-1 and HD-73 genes 
share many common features with these genes. In particular, 
the P2 protein genes have a high A+T content (65%), 
multiple potential polyadenylation signal sequences (26) 
and numerous ATTTA sequences (10). Because of its overall 
similarity to the poorly expressed wild-tjrpe B.t.k. HD-1 and 
HD-73 genes, the same problems are expected in expression 
of the wild-type P2 gene as were encountered with the 
previous examples. Based on the above-described method 
for designing the synthetic B.t. genes, a synthetic P2 gene 
has been designed which gene should be expressed at 
adequate levels for protection in plants. A comparision of the 
wild-type and synthetic P2 genes is shown in FIG, 13. 

EXAMPLE 7 

Synthetic B.t. Entomocidus Gene 

The Bj. entomocidus ("Blent") protein is a distinct insec- 
ticidal protein produced by some strains of B.t. bacteria. It 
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is characterized by its high level of activity against some 
lepidopterans that are relatively insensitive to B.t.k. HD-1 
and HD-73 such as Spodoptera species including beet army- 
worm (Msser et al., 1988). Genes encoding the Btent protein 
have been isolated and characterized (Honee et al, 1988). 
The Btent proteins encoded by these genes are approxi- 
mately the same length as B.t.k. HD-1 and HD-73. These 
proteins share only 68% amino acid homology with the 
B.t.k. HD-1 and HD-73 proteins. It is likely that only the 
N-terminal half of the Btent protein is required for insecti- 
cidal activity as is the case for HD-1 and HD-73. Over the 
first 625 amino acids, Btent shares only 38% amino acid 
homology with HD-1 and HD-73. 

Because of their higher activity against Spodoptera spe- 
cies that are relatively insensitive to HD-1 and HD-73, the 
Btent proteins are a desirable candidate for the production of 
insect tolerant transgenic plants either alone or in combina- 
tion with the other BJ. toxins described in the above 
examples. In some plants production of Btent alone might be 
sufficient to control the agronomically important pests. In 
other plants, the production of two distinct insect tolerance 
proteins would provide protection against a wider array of 
insects. Against those insects where both proteins are active, 
the combination of the B.t.k. HD-1 or HD-73 type protein 
plus the Btent protein should provide at least additive 
insecticidal efficacy, and may even provide a synergistic 
activity. In addition, because of its distinct amino acid 
sequence, the Btent protein may have a different mode of 
action than HD-1 or HD-73. Production of two insecticidal 
proteins in the same plant with different modes of action 
would minimize the potential for development of insect 
resistance to B.t. proteins in plants. The relative lack of DNA 
sequence homology with the B.t.k. type genes minimizes the 
potential for recombination between multiple insect toler- 
ance genes in the plant chromosome. 

The genes encoding the Btent protein although distinct in 
sequence from iho B.t.k. HD-1 and HD-73 genes share many 
common features with these genes. In particular, the Btent 
protein genes have a high A+T content (62%), multiple 
potential polyadenylation signal sequences (39 in the full 
length coding sequence and 27 in the first 1875 nucleotides 
that is likely to encode the active toxic fragment) and 
numerous ATTTA sequences (16 in the full length coding 
sequence and 12 in the first 1875 nucleotides) Because of its 
overall similarity to the poorly expressed wild type BJ.k. 
HD-1 and HD-73 genes, the wild-type Btent genes are 
expected to exhibit similar problems in expression as were 
encountered with the wild-type HD-1 and HD-73 genes. 
Based on the above-described method used for designing the 
other synthetic B.t. genes, a synthetic Btent gene has been 
designed which gene should be expressed at adequate levels 
for protection in plants. A comparision of the wild type and 
synthetic Btent genes is shown in FIG. 14. 

EXAMPLE 8 

Synthetic B.t.k. Genes for Expression in Com 

High level expression of heterologous genes in corn ceUs 
has been shown to be enhanced by the presence of a com 
gene intron (Callis et al., 1987). Typically these introns have 
been located in the 5* untranslated region of the chimeric 
gene. It has been shown that the CaMV35S promoter and the 
NOS 3' end function efficiently in the expression of heter- 
ologous genes in corn cells (Fromm et al., 1986). 

Referring to FIG. 15, a plant expression cassette vector 
(pMON744) was constructed that contains these sequences. 
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Specifically the expression cassette contains the enhanced 
CaMV 35S promoter followed by intron 1 of the corn Adhl 
gene (Callis et al., 1987). This is followed by a multilinker 
cloning site for insertion of coding sequences; this multi- 
linker contains a Bglll site among others. Following the 5 
multilinker is the NOS 3' end. pMON744 also contains the 
selectable marker gene SSS/NPTII/NOS 3' for kanamycin 
selection of transgenic corn cells. In addition, pMON744 has 
an E. coli origin of replication and an ampicillin resistance 
gene for selection of the plasmid in E. coli. 10 

Five B.t.k. coding sequences described in the previous 
examples were inserted into the Bglll site of pMON744 for 
corn cell expression oi B.t.k, The coding sequences inserted 
and resulting vectors were: 

1. Wild type B.t.k. HD-1 from pMON9921 to make 
pMON8652. 

2. Modified B.t.k. HD-1 from pMON5370 to make 
pMON8642. 

3. Synthetic B.t.k. HD-1 -from pMON5377 to make 20 
pMON8643. 

4. Synthetic B.t.k. HD-73 from pMON5390 to make 
pMON8644. 

5. Synthetic full length B.t.k. HD-73 from pMON10518 to 
make pMON10902. 25 

pMON8652 (wild-type B.t.k. HD-1) was used to trans- 
form com cell protoplasts and stably transformed kanamycin 
resistant callus was isolated. B.t.k. mRNA in the com cells 
was analyzed by nuclease SI protection and found to be 
present at a level comparable to that seen with the same 30 
wild-type coding sequence (pMON9921) in transgenic 
tomato plarits. 

pMON8652 and pMON8642 (modified HD-1) were used 
to transform corn cell protoplasts in a transient expression 
system. The level of B.t.k. mRNA was analyzed by nuclease 35 
SI protection. The modified HD-1 gave rise to a several fold 
increase in B.t.k. mRNA compared to the wild-type coding 
sequence in the transiently transformed corn cells. This 
indicated that the modif cations introduced into the B.t.k. 
HD-1 gene are capable of enhancing B.t.k. expression in 40 
monocot cells as was demonstrated for dicot plants and cells. 

pMON8642 (modified HD-1) and pMON8643 (synthetic 
HD-1) were used to transform Black Mexican Sweet (BMS) 
corn cell protoplasts by PEG-mediated DNA uptake, and 
stably transformed corn callus was selected by growth on 45 
kanamycin containing plant growth medium. Individual 
callus colonies that were derived from single transformed 
cells were isolated and propagated separately on kanamycin 
containing medium. 

To assess the expression of the B.t.k. genes in these cells, 50 
callus samples were tested for insect toxicity by bioassay 
against tobacco horn worm larvae. For each vector, 96 callus 
lines were tested by bioassay. Portions of each callus were 
placed on sterile water agar plates, and five neonate tobacco 
horaworm larvae were added and allowed to feed for 4 days. 55 
For pMON8643. 100% of the larvae died after feeding on 15 
of the 96 calli and these calli showed little feeding damage. 
For pMON8642, only 1 of the 96 calli was toxic to the 
larvae. This showed that the B.t.k. gene was being expressed 
in these samples at insecticidal levels. The observation that 60 
significantly more calli containing pMON8643 were toxic 
than for pMON8642 showed that significantly higher levels 
of expression were obtained when the synthetic HD-1 cod- 
ing sequence was contained in com cells than when the 
modified HD-1 coding sequence was used, similar to the 65 
previous examples with dicot plants. A semiquantitative 
immimoassay showed that the pMON8643 toxic samples 
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had significantly higher B.t.k. protein levels than the 
pMON8642 toxic sample. 

The 16 callus samples that were toxic to tobacco hora- 
worm were also tested for activity against European com 
borer. European com borer is approximately 40- fold less 
sensitive to the HD-1 gene product than is tobacco horn- 
worm. Larvae of European com borer were applied to the 
callus samples and allowed to feed for 4 days. Two of the 16 
calli tested, both of which contained pMON8643 (synthetic 
HD-1), were toxic to European corn borer larvae. 

To assess the expression of the B.t.k. genes in differenti- 
ated com tissue, another method of DNA delivery was used. 
Young leaves were excised from com plants, and DNA 
samples were delivered into the leaf tissue by microprojec- 
tile bombardment. In this system, the DNA on the micro- 
projectiles is transiently expressed in the leaf cells after 
bombardment. Three DNA samples were used, and each 
DNA was tested in triplicate. 

1. pMON744, the com expression vector with no B.t.k. 
gene. 

2. pMON8643 (synthetic HD-1). 

3. pMON752, a com expression vector for the GUS gene, 
no B.t.k. gene. 

The leaves were incubated at room temperature for 24 
hours. The pMON752 samples were stained with a substrate 
that allows visual detection of the GUS gene product. This 
analysis showed that over one hundred spots in each sample 
were expressing the GUS product and the the triplicate 
samples showed very similar levels of GUS expression. For 
the pMON744 and pMON8643 samples 5 larvae of tobacco 
homworm were added to each leaf and allowed to feed for 
48 hours. All three samples bombarded with pMON744 
showed extensive feeding damage and no larval mortality. 
All three samples bombarded with pMON8643 showed no 
evidence of feeding damage and 100% larval mortality. The 
samples were also assayed for the presence of B.t.k. protein 
by a qualitative immunoassay. All of the pMON8643 
samples had detectable B.t.k. protein. These resixlts demon- 
strated that the the synthetic B.t.k. gene was expressed in 
differentiated com plant tissue at insecticidal levels. 

EXAMPLE 9 

Synthetic Potato Leaf Roll Virus Coat Protein Gene 

Expression in plants of the coat protein genes from a 
variety of plant viruses has proven to be an effective method 
of engineering resistance to these viruses. In order to achieve 
virus resistance, it is important to express the viral coat 
protein at an effective level. For many plant virus coat 
protein genes, this has not proved to be a problem. However, 
for the coat protein gene from potato leaf roll virus (PLRV), 
expression of the coat protein has been observed to be low 
relative to other coat protein genes, and this lower level of 
protein has not led to optimal resistance to PLRV. 

The gene for PLRV coat protein 4s shown in FIG. 16. 
Referring to FIG. 16, the upper line of sequence shows the 
gene as it was originally engineered for plant expression in 
vector pMON893. The gene was contained on a 749 nucle- 
otide Bglll-EcoRI fragment with the coding sequence con- 
tained between nucleotides 20 and 643. This fragment also 
contained 19 nucleotides of 5' no needing sequence and 104 
nucleotides of 3' noncoding sentence. This PLRV coat 
protein gene was relatively poorly expressed in plants com- 
pared to other viral coat protein genes. 

A synthetic gene was designed to improve plant expres- 
sion of the PLRV coal protein. Referring again to FIG. 16, 
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the changes made in the synthetic PLRV gene are shown in 
the lower line. This gene was designed to encode exactly the 
same protein as the naturally occurring gene. Note that the 
beginning of the synthetic gene is at nucleotide 14 and the 
end of the sequence is at nucleotide 654. The coding 
sequence for the synthetic gene is from nucleotide 20 to 643 
of the figure. The changes indicated just upstream and 
downstream of these endpoints serve only to introduce 
convenient restriction sites just outside the coding sequence. 
Thus the size of the synthetic gene is 641 nucleotides which 
is smaller than the naturally occurring gene. The synthetic 
gene is smaller because substantially all of the noncoding 
sequence at both the 5' and 3' ends, except for segments 
encoding the Bglll and EcoRI restriction sites has been 
removed. 

The synthetic gene differs from the naturally occurring 
gene n two main respects. First, 41 individual ccdons within 
the coding sequence have been changed to remove nearly all 
codons for a given amino acid that constitute less than about 
15% of the codons for that amino acid in a survey of dicot 
plant genes. Second, the 5' and 3' noncoding sequences of 
the original aene have been removed. Although not strictly 
conforming to the algorithm described in FIG. 1, a few of the 
codon changes and especially the removal of the long 3' 
noncoding region is consistent with this algorithm. 

The original PLRV sequence contains two potential plant 
polyadenylation signals (AACCAAand AAGCAT) and both 
of the these occur in the 3' noncoding sequence that has been 
removed in the synthetic gene. The original PLRV gene also 
contains on AJ l'lA sequence. This is also contained in the 
3' noncoding sequence, and is in the midst of the longest 
stretch of uninterrupted A+T in the gene (a stretch of 7 A+T 
nucleotides). This sequence was removed in the synthetic 
gene. Tlius, sequences that the algorithm of FIG. 1 targets 
for change have been changed in the synthetic PLRV coat 
protein gene by removal of the 3* noncoding segment. 
Within the coding sequence, codon changes were also made 
to remove three other regions of sequence described above. 
In particular, two regions of 5 consecutive A+T and one 
region of 5 consecutive G+C within the coding sequence 
have been removed in the synthetic gene. 

The synthetic PLRV coat protein gene is cloned in a plant 
transformation vector such as pMON893 and used to trans- 
form potato plants as described above. These plants express' 
the PLRV coat protein at higher levels than achieved with 
the naturally occurring gene, and these plants exhibit 
increased resistance to infection by PLRV. 

EXAMPLE 10 

Expression of Synthetic BJ, Genes with RUBISCO 
Small Subunit Promoters and Chloroplast Transit 
Peptides 

The genes in plants encoding the small subunit of 
RUBISCO (SSU) are often highly expressed, light regulated 
and sometimes show tissue specificity. These expression 
properties are largely due to the promoter sequences of these 
genes. It has been possible to use SSU promoters to express 
heterologous genes in transformed plants. Typically a plant 
will contain multiple SSU genes, and the expression levels 
and tissue specificity of different SSU genes will be differ- 
ent. The SSU proteins are encoded in the nucleus and 
synthesized in the cytoplasm as precursors that contain an 
N-terminal extension known as the chloroplast transit pep- 
tide (CTP). The CTP directs the precursor to the chloroplast 
and promotes the uptake of the SSU protein into the chlo- 



roplast. In this process, the CTP is cleaved firom the SSU 
protein. These CTP sequences have been used to direct 
heterologous proteins into chloroplasts of transformed 
plants. 

5 The SSU promoters might have several advantages for 
expression of BJ.k. genes in plants. Some SSU promoters 
are very highly expressed and could give rise to expression 
levels as high or higher than those observed with the 
CaMV35S promoter. The tissue distribution of expression 

JO from SSU promoters is different from that of the CaMV35S 
promoter, so for control of some insect pests, it may be 
advantageous to direct the expression of Bj,k. to those cells 
in which SSU is most highly expressed. For example, 
although relatively constitutive, in the leaf the CaMV35S 

J 5 promoter is more highly expressed in vascular tissue than in 
some other parts of the leaf, while most SSU promoters are 
most highly expressed in the mesophyll cells of the leaf. 
Some SSU promoters also are more highly tissue specific, so 
it could be possible to utilize a specific SSU promoter to 

2Q express B,Lk. in only a subset of plant tissues, if for example 
B.t. expression in certain cells was found to be deleterious to 
those cells. For example, for control of Colorado potato 
beetle in potato, it may be advantageous to use SSU pro- 
moters to direct B.t.t. expression to the leaves but not to the 

25 edible tubers. 

Utilizing SSU CTP sequences to localize B.t. proteins to 
the chloroplast might also be advantageous. Localization of 
the B.t. to the chloroplast could protect the crotein from 
proteases found in the cytoplasm. This could stabilize the 

30 B.t. protein and lead to higher levels of accumulation of 
active orotein. B.t. genes containing the CTP could be used 
in combination with the SSU promoter or with other pro- 
moters such as CaMV35S. 

A variety of plant transformation vectors were constructed 

35 for the expression of B.t.k, genes utilizing SSU promoters 
and SSU CTPs. The promoters and CTPs utilized were from 
the petunia SSUlla gene described by Tumer et al. (1986) 
and from the Arabidopsis atslA gene (an SSU gene) 
described by Krebbers et al. (1988) and by Elionor et al. 

40 (1989). The petunia SSUlla promoter was contained on a 
DNA fragment that extended approximately 800 bp 
upstream of the SSU coding sequence. The Arabidopsis 
atslA promoter was contained on a DNA fragment that 
extended approximately 1.8 kb upstream of the SSU coding 

45 sequence. At the upstream end convenient sites from the 
multilinker of pUClS were used to move these promoters 
into plant transformation vectors such as pMON893. These 
promoter fragments extended to the start of the SSU coding 
sequence at which point an Ncol restriction site was engi- 

50 neered to allow insertion of the B.t, coding sequence, 
replacing the SSU coding sequence. 

When SSU promoters were used in combination with 
their CTP, the DNA fragments extended through the coding 
sequence of the CTP and a small oortion of the mature SSU 

55 coding sequence at which point an Ncol restriction site was 
engineered by standard techniques to allow the in frame 
fusion of B.t. coding sequences with the CTP. In particular, 
for the cetunia SSUlla CTP, B.t. coding sequences were 
used to the SSU sequence after amino acid 8 of the mamre 

60 SSU sequence at which point the Ncol site was placed. The 
8 amino acids of mature SSU sequence were included 
because preliminary in vitro chloroplast uptake experiments 
indicated that uptake was of B.t.k. was observed only if this 
segment of mature SSU was ncluded. For the Arabidopsis 

65 atslA CTP, the complete CTP was included plus 24 amino 
acids of mature SSU sequence plus the sequence gly-gly- 
arg-val-asn-cys-met-gln-ala-met. Sequence ID NO. 19 ter- 
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mi Dating in an Ncol site for B.t. fusion. This short sequence 
reiterates the native SSU CTP cleavage site (between the cys 
and met) plus a short segment surrounding the cleavage site. 
This sequence was included in order to insure proper uptake 
into chloroplasts. B.t. coding sequences were fused to this 
atslA CTP after the met codon. In vitro uptake experiments 
with this CTP construction and other (non-5./.) coding 
sequences showed that this CTP did target proteins to the 
chloroplast. 



B.t.k, HD-73. This is the highest level of B.t,k. protein yet 
obtained with any of the synthetic genes. 

This result is surprising in two respects. First, as noted 
above, the wild type coding sequences fused to the atslA 
promoter and CTP did not show any evidence of higher 
levels of expression than for En 35S, and in fact had lower 
expression based on the absence of any insecticidal plants. 
Second, Elionor et al. (1989) show that for two other genes, 
the atslA CTP can increase expression from the atslA 



When CTPs were used in combination with the CaMV lO promoter by about 10-fold. For the synthetic BJ.k. HD-73 



35S promoter, the same CTP segments were used. They 
were excised just upstream of the ATG start sites of the CTP 
by engineering of BgUI sites, and placed downstream of the 
CaMV35S promoter in pMON893, as BgUI to Ncol frag- 
ments. B.t. coding sequences were fused as described above. 

The wild type B.t.k. HD-1 coding sequence of 
pMON9921 (see FIG. 1) was fused to the atslA promoter to 
make pMON1925 or the atsl A promoter plus CTP to make 
pMON1921. These vectors were used to transform tobacco 
plants, and the plants were screened for activity against 
tobacco horn worm. No toxic plants were recovered. This is 
surprising in Light of the fact that toxic plants could be 
recovered, albeit at a low frequency, after transformation 
with pMON9921 in which the B.t.k. coding sequence was 
expressed from the enhanced CaMV35S, promoter in 
pMON893, and in light of the fact that Elionor et al. (1989) 
report that the atslA promoter itself is comparable in 
strength to the CaMV35S promoter and approximately 
10-fold stronger when the CTP sequence is included. At least 
for the wild-type B.t.k, HD-1 coding sequence, this does not 
appear to be the case. 

A variety of plant transformation vectors were constructed 
utilizing either the truncated synthetic. HD-73 coding 
sequence of FIG. 4 or the full length B.t.k, HD-73 coding 
sequence of FIG. 11. These are listed in the table below. 

TABLE XV 

Gene Constructs with CTPs 



B.tJc. HD-73 



Vector 


Promoter 


CTP 


Coding Sequence 


pMON10806 


En 35S 


atsl A 


truncated 


pMONl08l4 


En 358 


SSUlla 


full length 


pMONlOSll 


SSUlla 


SSUlla 


truncated 


pMON10819 


SSUlla 


none 


truncated 


PMON10815 


atslA 


none 


truncated 


pMON10817 


atslA 


atslA 


truncated 


pMON10821 


En 358 


atslA 


truncated 


pMON10822 


En 358 


atslA 


full length 


pMON1083S 


SSUlla 


SSUlla 


full length 


pMON10839 


atsl A 


atslA 


full length 



All of the above vectors were used to transform tobacco 
plants. For aU of the vectors containing truncated B.t.k. 
genes, leaf tissue from these plants has been analyzed for 
toxic-zt to nsects and BJ.k. protein levels by immunoassay. 
pMON10806, 10811, 10819 and 10821 produce levels of 
B.t.k. protein comparable to pMON5383 and pMON5390 
which contain synthetic B.t.k. HD-73 coding sequences 
driven by the En 35S promoter itself with no CTP. These 
plants also have the insecticidal activity expected for the 
B.t.k. protein levels detected. For pMON10815 and 
pMON10817 (containing the atslA promoter) the level of 
B.t.k. protein is about 5 -fold higher than that found in plants 
containing pMON5383 or 5390. These plants also have 
higher insecticidal activity. Plants containing 10815 and 
10817 contain up to 1% of their total soluble leaf protein as 
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gene, there is no consistent increase seen by including the 
CTP over and above that seen for the atslA promoter alone. 

Tobacco plants containing the full length synthetic HD-73 
fused to the SSUllA CTP and driven by the En 35S 
promoter produced levels of B.t.k. protein and insecticidal 
activity comparable to pMON1518 which does not include 
the CTP. In addition, for pMON10518 the B.t.k, protein 
extracted from plants was observed by gel electrophoresis to 
contain multiple forms less than full length, apparently due 
the cleavage of the C-terminal portion (not required for 
toxicity) in the cytoplasm. For pMON10814, the majority of 
the protein appeared to be intact full length indicating that 
the protein has been stabilized from proteolysis by targeting 
to the chloroplast. 

EXAMPLE 11 

Taroetina of B.t. Proteins to the Extracellular Space 
or Vacuole through the Use of Signal Peotides 

The B.t. proteins produced from the synthetic genes 
described here are localized to the cytoplasm of the plant 
cell, and this cytoplasmic localization results in plants that 
are insecticidaUy effective. It may be advantageous for some 
purposes to direct the B.t. proteins to other compartments of 
the plant cell. Localizing B.r. proteins in compartments other 
than the cytoplasm may result in less exposure of the B.t, 
proteins to cytoplasmic proteases leading to greater accu- 
mulation of the protein jdelding enhanced insecticidal activ- 
ity. Extracellular localization coxild lead to more eflScient 
exposure of certain insects to the B.t. proteins leading to 
greater efficacy. If a B,t. protein were found to be deleterious 
to plant cell function, then localization to a noncytoplasmic 
compartment could protect these cells from the protein. 

In plants as well as other eucaryotes, proteins that are 
destined to be localized either extracellularly or in several 
specific compartments are typically synthesized with an 
N-terminal amino acid extension known as the signal pep- 
tide. This signal peptide directs the protein to enter the 
compartmentaHzation pathway, and it is typically cleaved 
from the mature protein as an early step in compartmental- 
ization. For an extracellular protein, the secretory pathway 
typically involves cotranslational insertion ifiito the endo- 
plasmic reticulum with cleavage of the signal peptide occur- 
ing at this stage. The mature protein then passes thru the 
Golgi body into vesicles that fuse with the plasma mem- 
brane thus releasing the protein into the extracellular space. 
Proteins destined for other compartments follow a similar 
pathway. For example, proteins that are destined for the 
endoplasmic reticulum or the Golgi body follow this 
scheme, but they are specifically retained in the appropriate 
compartment. In plants, some proteins are also targeted to 
the vacuole, another membrane bound compartment in the 
cytoplasam of many plant cells. Vacuole targeted proteins 
diverge from the above pathway at the Golgi body where 
they enter vesicles that fiise with the vacuole. 

A common feature of this protein targeting is the signal 
peptide that initiates the compartmentalization process. Fus- 
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ing a signal peptide to a protein will in many cases lead to 
the targeting of that protein to the endoplasmic reticulum. 
The efficiency of this step may depend on the sequence of 
the mature protein itself as well. The signals that direct a 
protein to a specific compartment rather than to the extra- 5 
cellular space are not as clearly defined. It appears that many 
of the signals that direct the protein to specific compartments 
are contained within the amino acid sequence of the mature 
protein. This has been shown for some vacuole targeted 
proteins, but it is not yet possible to define these sequences jq 
precisely. It appears that secretion into the extracellular 
space is the "default" pathway for a protein that contains a 
signal sequence but no other compartmentalization signals. 
Thus, a strategy to direct 5. proteins out of the cytoplasm 
is to fuse the genes for synthetic B.t. genes to DNA 
sequences encoding known plant signal peptides. These 
fusion genes will give rise to B.t. proteins that enter the 
secretory pathway, and lead to extracellualar secretion or 
targeting to the vacuole or other compartments. 

Signal sequences for several plant genes have been 20 
described. One such sequence is for the tobacco pathogen- 
esis related protein PR lb described by Cornelissen et al. The 
PRlb protein is normally localized to the extracellular 
space. Another type of signal peptide is contained on seed 
storage proteins of legumes. These proteins are localized to 25 
the protein body of seeds, which is a vacuole like compart- 
ment found in seeds. A signal peptide DNA sequence for the 
beta subunit of the 7S storage protein of common bean 
(Phaseolus vulgaris), PvuB has been described by Doyle et 
al. Based on the published these published sequences, genes 30 
were synthesized by chemical synthesis of oligonucleotides 
that encoded the signal peptides for PRlb and PvuB. The 
synthetic genes for these signal peptides corresponded 
exactly to the reported DNA sequences. Just upstream of the 
translational intiation codon of each signal peptide a BamHI 35 
and Bgll: site were inserted with the BamHI site at the 5 ' 
end. This allowed the insertion of the signal peptide encod- 
ing segments into the BgUI site of pMON893 for expression 
from the En 35S promoter. In some cases to achieve secre- 
tion or compartmentalization of heterologous proteins, it has 40 
proved necessary to include some amino acid sequence 
beyond the normal cleavage site of the signal peptide. This 
may be necessary to insure proper cleavage of the signal 
peptide. For PRlb the synthetic DNA sequence also 
included the first 10 amino acids of mature PRlb. For PvuB 45 
the synthetic DNA sequence included the first 13 amino 
acids of mature PvuB. Both synthetic signal peptide encod- 
ing segments ended with Ncol sites to allow fusion in frame 
to the methionine initiation codon of the synthetic R/. genes. 

Four vectors encoding synthetic B,t.k. HD-73 genes were 50 
constructed containing these signal peptides. The synthetic 
truncated HD-73 gene from pMON5383 was fused with the 
signal peptide sequence of PvuB and incorporated into 
pMON893 to create pMON10827. The The synthetic trun- 
cated HD-73 gene from pMON5383 was also fused with the 55 
signal peptide sequence of PRlb to create pMON10824. The 
full length synthetic HD-73 gene from pMON10518 was 
fused with the signal peptide sequence of PvuB and incor- 
porated into pMON893 to create pMON10828. The full 
length synthetic HD-73 gene from pMON10518 was also 60 
fused with the signal peptide sequence of PRlb and incor- 
porated into pMON893 to create pMON10825. 

These vectors were used to transform tobacco plants and 
the plants were assayed for expression of the B.t.k, protein 
by Western blot analysis and for insecticidal efficacy. 65 
pMON10824 and pMON10827 produced amounts B.t.k. 
protein in leaf comparable to the truncated HD-73 vectors. 



pMON5383 and pMON5390. pMON10825 and 
pMON10828 produced full length B.t.k. protein in amounts 
comparable to pMON10518. In all cases, the plants were 
insecticidally active against tobacco horn worm. 
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SEQUENCE USTING 



< 1 6 0 > 27 

< 2 1 0 > 1 

< 2 1 1 > 18 , 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 1 

tccccagataatatcaac 18 



< 2 1 0 > 2 

< 2 1 1 > 48 

< 2 1 2 > DNA 

< 2 1 3 > Artificiat Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 2 

ggcttgattc ctagcgaact cttcgattct ctggttgatg agctgttc 48 



< 2 1 0 > 3 

< 2 1 1 > 54 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 3 

caaaactgag aggtggaggt tggcagcttg aacgtacacg gagaggagag gaac 54 



< 2 1 0 > 4 

< 2 1 1 > 48 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > I>cscri;Aion of Artificial Sequence: mutagenesis primer 



< 4 0 0 > 4 
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47 48 

-continued 



agttagtgta agctctcttc tgaactggtt gtacctgatc caatctct 48 



< 2 I O > 5 

< 2 1 1 > 39 

< 2 1 2 > DNA 

< 2 1 3 > Arti£cial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 5 

agccatgatc tggtgaccgg accagtagta ttctcctct 39 



< 2 1 0 > 6 

< 2 1 1 > 32 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Descrifrtion of Artificial Sequence: mutagenesis primer 



< 4 0 0 > 6 

agttgttgSt tgttgatccc gatgttaaaa gg 



< 2 1 0 > 7 

< 2 1 1 > 37 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 



3 2 



< 4 0 0 > 7 

gtgatgaagggatgatgttg ttgaactcag cactacg 37 



< 2 1 0 > 8 

< 2 1 1 > 100 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 O > 8 

cagaagttcc agagccaaga ttagtagact tggtgagtgg gatttgggtg atttgtgatg 60 
aagggatgat gttgttgaac tcagcactac gatgtatcca 100 



< 2 1 0 > 9 

< 2 1 1 > 27 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Scqtience: mutagenesis primer 

< 4 0 0 > 9 

tgatgtgtgg aactgaaggt ttgtggt 



< 2 1 0 > 10 

< 2 1 1 > 50 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 



< ,4 0 0 > 10 
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50 



-continued 



a a t 



actatcg gatgcgatga tgttgttgaa ctcagcacta cgglgtatcc 



5 0 



< 2 1 0 > 11 

< 2 I 1 > 33 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 11 

tcctgaaatgacagaaccgttgaagagaaagtt 33 



< 2 1 0 > 12 

< 2 1 I >48 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 12 

atttccactg ctgttgagtc taacgaggtc tccaccagtg aatcctgg 48 



< 2 1 0 > 13 

< 2 1 1 > 60 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 13 

gtgaataggg gtcacagaag catacctcac acgaactcta tatctggtag atgttggatg 60 



< 2 1 0 > 14 

< 2 1 1 > 33 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 14 

tgtagctggaactgtattggagaagatggatga 33 



< 2 1 0 > IS 

< 2 1 1 > 48 

< 2 I 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 15 

ttcaaagtaa ccgaaatcgc tggattggag attatccaag gaggtagc 48 



< 2 1 0 > 16 

< 2 1 1 > 39 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 



< 4 0 



0 



> 16 



a c t 



aaagttt ctaacaccca 



cgatgttacc gagtgaaga 



3 9 
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< 2 I 0 > 17 

< 2 I 1 > 36 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Dcscriplioii of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 17 

aactggaatgaactcgaatc tgtcgataat cactcc 36 

< 2 1 0 > 18 

< 2 1 1 > 54 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: mutagenesis primer 

< 4 0 0 > 18 

ggacactaga tcttagtgat aatcggtcac atttgtctlg aglccaagct figtt 54 

< 2 1 0 > 19 

< 2 1 1 > 10 

< 2 1 2 > PETT 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: peptide 

sequence reiterates the native small subunit 
promoter chloroplast transit peptide cleavage site 

< 4 0 0 > 19 

Gly Gly Arg Val Asn Cys Met Gin Ala Met 
1 5 10 

< 2 1 0 > 20 

< 2 1 1 > 1743 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > DescriptJon of Artificial Sequence: synthetic 

structural gene encoding B.tJc. HD-1 insectidal 
protein 

< 4 0 0 > 20 

atggctatag aaactggtta caccccaatc gatatttcct tgtcgctaac gcaatttctt 60 

ttgagtgaat ttgttcccgg tgctggattt gtgttaggac tagttgatat tatctgggga 120 

atttttggtc cctctcaatg ggacgcattt cttgtacaaa ttgaacagct catcaaccag 180 

agaatcgaag agttcgctag gaatcaagcc atttctagat tagaaggact aagcaatctt 240 

tatcaaattt acgcagaatc ttttagagag tgggaagcag atcctactaa tccagcatta 300 

agagaagaga tgcgtattca attcaatgac atgaacagtg cccttacaac cgctattcct 360 

ctttttgcag ttcaaaatta tcaagttcct ctcctctccg tgtacgttca agctgccaac 420 

ctccacctct cagttttgag agatgtttca gtgtttggac a&aggtgggg atttgatgcc 4S0 

gcgactatca atagtcgtta taatgattta actaggclta ttggcaacta tacagatcat 540 

gctgtacgct ggtacaatac gggattagag cgtgtatggg gaccggattc tagagattgg 600 

atcaggtaca accagttcag aagagagctt acactaactg taltagatat cgtttctcta 660 

Ittccgaact atgatagtag aacgtatcca attcgaacag tttcccaatt aacaagagaa 720 
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atttatacaa acccagtatt agaaaatttt gatggtagtt ttcgaggctc ggctcagggc 780 

atagaaggaa gtattaggag tccacatttg atggatatac ttaatagtat aaccatctat 840 

acggatgctc atagaggaga atactactgg tccggtcacc agatcatggc ttctcctgta 900 

gggttttcgg ggccagaatt cacttttccg ctatatggaa ctatgggaaa tgcagctcca 960 

caacaacgta ttgttgc'tca actaggtcag ggcgtgtata gaacattatc gtccacctta 1020 

tatagaagac cttttaacat cgggatcaac aaccaacaac tatctgttct tgacgggaca 1080 

gaatttgctt atggaacctc ctcaaatttg ccatccgctg tatacagaaa aagcggaacg 1140 

gtagattcgc tggatgaaat accgccacag aataacaacg tgccacctag gcaaggattt 1200 

agtcatcgat taagccatgt ttcaatgttt cgttcaggct ttagtaatag tagtgtaagt 1260 

ataataagag ctcctatgtt ctcttggata catcglagtg ctgagttcaa caacatcatc 1320 

ccttcatcac oaatcaccca aatcccactc accaogtcta ctaatcttgg ctctggaact 1380 

tctgtcgtta aaggaccagg atttacagga ggagatattc ttcgaagaac ttcacctggc 1440 

cagatltcaa ccttaagagt aaataltact gcaccattat cacaaagala tcgggtaaga ISOO 

attcgctacg cttctaccac aaaccttcag ttccacacat caattgacgg aagacctatt 1S60 

aatcagggga atttttcagc aactatgagt agtgggagta atttacagtc cggaagcttt 1620 

aggactgtag gttttactac tccgtttaac ttttcaaatg gatcaagtgt atttacgtta 1680 

agtgctcatg tcttcaattc aggcaatgaa gtttatatag atcgaattga atttgttccg 1740 

g c a 17 4 3 

< 2 1 0 > 21 

< 2 1 1 > 1767 

< 2 1 2 > DNA 

< 2 1 3 > Arti£cial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: synthetic 

structural gene encoding insccticidal protein of 
B.tJc. HD- 3 

< 4 0 0 > 21 

atggccattg aaaccggtta cactcccatc gacatctcct tgtccttgac acagtttctg 60 

ctcagcgagt tcgtgccagg tgctgggttc gttctcggac tagttgacat catctggggt 120 

atctttggtc catctcaatg ggatgcattc ctggtgcaaa ttgagcagtt gatcaaccag 180 

Bggatcgaag agttcgccag gaaccaggcc atctctaggt tggaaggatt gagcaatctc 240 

taccaaatct atgcagagag cttcagagag tgggaagccg atcctactaa cccagctctc 300 

cgcgaggaaa tgcgtattca attcaacgac atgaacagcg ccttgaccac agctatccca 360 

ttgttcgcag tccagaacta ccaagttcct ctcttgtccg tgtacgttca agcagctaat 420 

cttcacctca gcgtgcttcg agacgttagc gtgtttgggc aaaggtgggg attcgatgct 480 

gcaaccatca atagccgtta caacgacctt actaggctga ttggaaacta caccgaccac 540 

gctgttcgtt ggtacaacac tggcttggag cgtglctggg gtcctgattc tagagattgg 600 

attagataca accagttcag gagagaattg accctcacag ttttggacai tgtgtctctc 660 

ttcccgaact atgactccag aacctaccct atccgtacag tgtcccaact taccagagaa 720 

atctatacta acccagttct tgagaactlc gacggtagct tccgtggttc tgcccaaggt 780 

atcgaaggct ccatcaggag cccacacttg atggacatct tgaacagcat aactatclac 840 

accgatgctc acagaggaga gtattactgg tctggacacc agatcatggc ctctccagtt 900 

ggattcagcg ggcccgaglt tacclttccl ctctatggaa ctatgggaaa cgccgctcca 960 
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caacaacgta tcgttgctca actaggtcag ggtgtctaca gaaccltgtc ttccaccttg 1020 

tacagaagac ccttcaatat cggtatcaac aaccagcaac tttccgttct tgacggaaca 1080 

gagttcgcct atggaacctc ttctaacttg ccatccgctg tttacagaaa gagcggaacc 1140 

gttgattcct tggacgaaat cccaccacag aacaacaatg tgccacccag gcaaggattc 1200 

tcccacaggt tgagccacgt gtccatgttc cgttccggat tcagcaacag ttccgtgagc 1260 

atcatcagag ctcctatgtt ctcttggata caccgtagtg ctgagttcaa caacatcatc 1320 

gcatccgata gtattactca aatccctgca gtgaagggaa actttctctt caacggttct 1380 

gtcatttcag gaccaggatt cactggtgga gacctcgtta gactcaacag cagtggaaat 1440 

aacattcaga atagagggta tattgaagtt cc^aattcact tcccatccac atctaccaga 1500 

tatagagtlc gtgtgaggta tgcttctgtg acccctattc acctcaacgt taattggggt 1560 

aattcatcca tcttctccaa tacagttcca gctacagcta cctccttgga taatctccaa 1620 

tccagcgatt tcggttactt tgaaagtgcc aatgctttta catcttcact cggtaacatc 1680 

gtgfifitStta gaaactltag tgggactgca ggagtgatta tcgacagatt cgagttcatt 1740 

ccagttactgcaacactcgaggctgag 1767 

< 2 1 0 > 22 

< 2 1 1 > 1845 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence: synthetic 

stmctoral gene encoding insecticidal protein of 
B.ti. HD- 1 

< 4 0 0 > 22 

atggacaaca acccaaacat caacgaatgc attccataca actgcttgag taacccagaa 60 

gttgaagtac ttggtggaga acgcattgaa accggttaca ctcccatcga catctccttg 120 

tccttgacac agtttctgct cagcgagttc gtgccaggtg ctgggttcgt tctcggacta 180 

gttgacatca tctggggtat ctttggtcca tctcaatggg atgcattcct ggtgcaaatt 240 

gagcagttga tcaaccagag gatcgaagag ttcgccagga accaggccat ctctaggttg 300 

gaaggattga gcaatctcta ccaaatctat gcagagagct tcagagagtg ggaagccgat 360 

'cctactaacc cagctctccg cgaggaaatg cgtattcaat tcaacgacat gaacagcgcc 420 

ttgaccacag ctatcccat-t gttcgcagtc cagaactacc aagttcctct cttgtccgtg 480 

tacgttcaag cagctaatct tcacctcagc gtgcttcgag acgttagcgt gtltgggcaa 540 

3SS^85S5&* tcgatgctgc aaccatcaat agccgttaca acgaccttac taggctgatt 600 

ggaaactaca ccgaccacgc tgttcgttgg tacaacactg gcttggagcg tgtctggggt 660 

cctgattcta gagattggat tagatacaac cagttcagga gagaattgac cctcacagtt 720 

ttggacattg tgtctctctt cccgaactat gactccagaa cctaccctat ccgtacagtg 780 

tcccaactta ccagagaaat ctatactaac ccagttcttg agaacttcga cggtagcttc 840 

cgtggttclg cccaaggtat cgaaggctcc atcaggagcc cacacttgat ggacatcttg 900 

a-a cagcataa ctatctacac cgatgctcac agaggagagt attactggtc tggacaccag 960 

atcatggcct ctccagttgg attcagcggg cccgagtlta cctttcctct ctatggaact 1020 

atgggaaacg ccgctccaca acaacgtatc gttgctcaac taggtcaggg igtctacaga 1080 

accttgtctt ccaccttgta cagaagaccc ttcaatatcg gtatcaacaa ccagcaactt 1140 
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tccgttcttg acggaacaga gttcgcctat ggaacctctt claacttgcc atccgctgtt 1200 

tacagaaaga gcggaaccgt tgattccttg gacgaaatcc caccacagaa caacaatgtg 1260 

ccacccaggc aaggattctc ccacaggttg agccacgtgt ccatgttccg ttccggattc 1320 

agcaacagtt ccgtgagcat catcagagct cctatgttct catggattca tcgtagtgct 13S0 

gagttcaaca atatcatt'cc ttcctctcaa atcacccaaa tcccattgac caagtctact 1440 

aaccttggat ctggaacttc tgtcgtgaaa ggaccaggct tcacaggagg tgatattctt 1500 

agaagaactt ctcctggcca gattagcacc ctcagagtta acatcactgc accactttct 1S60 

caaagatatc gtgtcaggat tcgttacgca tctaccacta acttgcaatt ccacacctcc 1620 

atcgacggaa ggcctatcaa tcagggtaac ttctccgcaa ccatgtcaag cggcagcaac 16S0 

ttgcaatccg gcagcttcag aaccgtcggt ttcactactc ctttcaactt ctctaacgga 1740 

tcaagcgttt tcacccttag cgctcatgtg ttcoattctg gcaatgaagt gtacattgac 1800 

cgtattgagt ttgtgcctgc cgaagttacc ttcgaggctg agtac 1845 

< 2 I 0 > 23 

< 2 I 1 > 1921 

< 2 1 2 > DNA 

< 2 1 3 > Artificial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequcnce:syiitfaetic 

structural gene encoding insecticidal protein 
derived from B.tJt. HD-73 

< 4 0 0 > 23 

atggacaaca acccaaacat caacgaatgc attccataca actgcttgag taacccagaa 60 

gttgaagtac ttggtggaga acgcattgaa accggttaca ctcccatcga catctccttg 120 

tccttgacac agtttctgct cagcgagttc gtgccaggtg ctgggttcgt tctcggacta 180 

gttgacatca tctggggtat ctttggtcca tctcaatggg atgcattcct ggtgcaaatt 240 

gagcagttga tcaaccagag gatcgaagag ttcgccagga accaggccat ctctaggttg 300 

gaaggattga gcaatctcta ccaaatctat gcagagagct tcagagagtg ggaagccgat 360 

cctactaacc cagctctccg cgaggaaatg cgtattcaat tcaacgacat gaacagcgcc 420 

ttgaccacag ctatcccatt gttcgcagtc cagaactacc aagttcctct cttgtccgtg 480 

tacgttcaag cagctaatct tcacctcagc gtgcttcgag acgttagcgt gtttgggcaa 540 

^S&^&8S&^^ tcgatgctgc aaccatcaat agccgttaca acgaccltac taggctgatt 600 

ggaaactaca ccgaccacgc tgttcgttgg tacaacactg gcttggagcg tgtctggggt 660 

cctgattcta gagattggat tagatacaac cagttcagga gagaattgac cctcacagtt 720 

ttggacattg tgtctctctt cccgaactat gactccagaa cctaccctat ccgtacagtg 780 

tcccaactta ccagagaaat ctatactaac ccagttcttg agaacttcga cggtagcttc 840 

cgtggltctg cccaaggtat cgaaggctcc atcaggagcc cacacttgat ggacatcttg 900 

aacagcataa clatctacac cgatgctcac agaggagagt attactggtc tggacaccag 960 

atcatggcct ctccagttgg attcagcggg cccgagttta cctttcctct ctatggaact 1020 

atgggaaacg ccgctccaca acaacgtatc gttgctcaac taggtcaggg tgtctacaga 1080 

accttgtctt ccaccttgta cagaagaccc ttcaatatcg gtatcaacaa ccagcaactt 1140 

tccgttcttg acggaacaga gttcgcctat ggaacctctt ctaacttgcc atccgctgtt 1200 

tacagaaaga gcggaaccgt tgattccttg gacgaaatcc caccacagaa caacaatgtg 1260 

ccacccaggc aaggattctc ccacaggttg agccacgtgt ccatgttccg ttccggattc 1320 
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agcaacagtt ccgtgagcat catcagagct cctatgttct citggataca ccgtagtgct 13S0 

gagttcaaca acatcatcgc atccgatagt attactcaaa tccctgcagt gaagggaaac 1440 

tttctcttca acggttctgt catttcagga ccaggattca ctggtggaga cctcgttaga ISOO 

ctcaacagca gtggaaataa cattcagaat agagggtata ttgaagttcc aattcacttc 1560 

ccatccacat ctaccagata tagagttcgt gtgaggtatg cttctgtgac ccctattcac 1620 

ctcaacgtta attggggtaa ttcatccatc ttctccaata cagttccagc tacagctacc 1680 

tccttggata atctccaatc cagcgatttc ggttactttg aaagtgccaa tgcttttaca 1740 

tcttcactcg gtaacatcgt gggtgttaga aactttagtg ggactgcagg agtgattatc 1800 

gacagattcg agttcattcc agttactgca acactcgagg ctgaatataa tctggaaaga 1860 

gcgcagaagg cggtaatgcg ctgtttacgt clacaaacca gcttggactc aagacaaatg 1920 

g 19 2 1 

< 2 1 0 > 24 

< 2 1 I > 3534 

< 2 1 2 > DNA 

< 2 1 3 > ArtiScial Sequence 

< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence :synthctic 

structural gene encoding insecticidal protein 
derived from B.tJc HD-73 

< 4 0 0 > 24 

atggacaaca acccaaacat caacgaatgc attccataca actgcttgag taacccagaa 60 

gttgaagtac ttggtggaga acgcattgaa accggttaca ctcccatcga catctccttg 120 

tccttgacac agtttctgct cagcgagttc gtgccaggtg ctgggttcgt tctcggacta 180 

gttgacatca tctggggtat ctttggtcca tctcaatggg atgcattcct ggtgcaaatt 240 

gagcagttga tcaaccagag gatcgaagag ttcgccagga accaggccat ctctaggttg 300 

gaaggattga gcaatctcta ccaaatctat gcagagagct tcagagagtg ggaagccgat 360 

cctactaacc cagctctccg cgaggaaatg cgtattcaat tcaacgacat gaacagcgcc 420 

ttgaccacag ctatcccatt gttcgcagtc cagaactacc aagttcctct cttgtccgtg 480 

tacgttcaag cagctaatct tcacctcagc gtgcttcgag acgttagcgt gtttgggcaa 540 

aggtggggat tcgatgctgc aaccatcaat agccgttaca acgaccttac taggctgatt 600 

ggaaactaca ccgaccacgc tgttcgttgg tacaacactg gcttggagcg tgtctggggt 660 

cctgattcta gagattggat tagatacaac cagttcagga gagaattgac cctcacagtt 720 

ttggacattg tgtctctctt cccgaactat gactccagaa cctaccctat ccgtacagtg 780 

tcccaactta ccagagaaat ctatactaac ccagttcttg agaacltcga cggtagcttc 840 

cgtggttctg cccaaggtat cgaaggctcc atcaggagcc cacacttgat ggacatcttg 900 

aacagcataa ctatctacac cgatgctcac agaggagagt attactggtc tggacaccag 960 

atcatggcct ctccagttgg attcagcggg cccgagttta cctttcctct ctatggaact 1020 

Aigggatticg ccgctccaca acaacgtatc gttgclcaac taggtcaggg tgtctacaga 1080 

accttgtctt ccaccttgta cagaagaccc ttcaatatcg glatcaacaa ccagcaactt 1140 

tccgitcttg acggaacaga gttcgcctat ggaacctcit ctaacttgcc atccgctgtt 1200 

tacagaaaga gcggaaccgt tgattccttg gacgaaatcc caccacagaa caacaatgtg 1260 

ccacccaggc aaggattctc ccacaggttg agccacglgt ccatgltccg ttccggattc 1320 
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agcaacagtt ccgtgagcat catcagagct cctatgttct cttggataca ccgtagtgct 1380 

gagttcaaca acatcatcgc atccgatagt attactcaaa tccctgcagt gaagggaaac 1440 

tttctcttca acggttctgt catttcagga ccaggattca ctggtggaga cctcgttaga ISOO 

ctcaacagca gtggaaataa cattcagaat agagggtata ttgaagttcc aattcacttc 1S60 

ccatccacat ctaccagata tagagttcgt gtgaggtatg cttctgtgac ccctattcac 1620 

ctcaacgtta attggggtaa ttcatccatc ttctccaata cagttccagc tacagctacc 1680 

tccttggata atctccaatc cagcgatttc ggttactttg aaagtgccaa tgcttttaca 1740 

tcttcactcg gtaacatcgt gggtgttaga aactttagtg ggactgcagg agtgattatc 1800 

gacagattcg agttcattcc agttactgca acactcgagg ctgaatataa tctggaaaga 1860 

gcgcagaagg cggtgaalgc gctglttacg tctacaaacc agctcggcct caagaccaat 1920 

gtgacggatt atcatattga tcaagtgtcc aacttggtga cctacctcag cgatgagttc 1980 

tgtctggatg aaaagcgaga attgtccgag aaagtcaaac atgcgaagcg actcagtgat 2040 

gaacgcaatt tactccaaga ttcaaatttc aaagacalta ataggcaacc agaacgtggg 2100 

tggfigcggaa gtacagggat taccatccag ggaggtgacg acgtgttcaa ggagaactac 21. 6 0 

gtcacactat caggtacctt tgatgagtgc tatccaacat acctctacca gaagatcgac 2220 

gagtccaagt tgaaagcctt tacccgttat caattaagag ggtatatcga agatagtcaa 2280 

gacctcgaga tctacctcat ccgctacaat gcaaaacatg aaacagtaaa tgtgccaggt 2340 

acgggttcct tatggccgct ttcagcccaa agtccaatcg gaaagtgtgg agagccgaat 2400 

cgatgcgcgc cacaccttga atggaatcct gacttagatt gttcgtgtag ggatggagaa 2460 

aagtgtgccc atcattcgca tcat'ttctcc ttagacattg atgtaggatg tacagactta 2520 

aatgaggacc taggtgtatg ggtgatcttt aagattaaga cgcaagatgg gcacgcaaga 2580 

ctagggaatc tagagtttct cgaagagaaa ccattagtag gagaagcgct agctcgtgtg 2640 

aaaagagcgg agaaaaaatg gagagacaaa cgtgagaagt tggaatggga gaccaacatc 2700 

gtctacaaag aggcaaaaga atctgtagat gctttatttg taaactctca atatgatcaa 2760 

ttacaagcgg atacgaatat tgccatgatt catgcggcag ataaacgtgt tcatagcatt 2820 

cgagaagctt atctgcctga gctgtctgtg attccgggtg tcaatgcggc tatttttgaa 2880 

gaattagaag ggcgtatttt cactgcattc tccctctacg atgccagaaa cgtcatcaag 2940 

aacggtgact tcaacaatgg cttatcctgc tggaacgtga aagggcatgt agatgtagaa 3000 

gaacaaaaca accaacgttc ggtccttgtt gttccggaat gggaagcaga agtgtcacaa 3060 

gaagttcgtg tctgtccggg tcgtggctat atccttcgtg tcacagcgta caaggaggga 3120 

tatggagaag gttgcgtaac cattcatga^ atcgagaaca atacagacga actgaagttt 3180 

agcaactgcg tagaagagga aatctatcca aataacacgg taacgtgtaa tgattatact 3240 

gtaaatcaag aagaatacgg aggtgcgtac acttctcgta atcgaggata taacgaagct 3300 

ccttccgtac cagctgatta tgcgtcagtc tatgaagaaa aatcgtatac agatggacga 3360 

agagagaatc cttgtgaatt taacagaggg tatagggatt acacgccact accagttggt 3420 

tatgtgacaa aagaattaga atacttccca gaaaccgata aggtatggat tgagattgga 3480 

gaaacggaag gaacatttat cgtggacagc gtggaattac tccttatgga ggaa 3534 

< 2 1 0 > 25 

< 2 1 I > 3534 

< 2 1 2 > DNA 

< 2 1 3 > Ani£cial Sequence 
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< 2 2 0 > 

< 2 2 3 > Description of Artificial Sequence:synthetic 

structural gene encoding insecticidal protein 
derived from B.tJc- HD-73 

< 4 0 0 > 25 

^tSS^^^**^^ acccaaacat caacgaatgc attccataca actgcttgag taacccagaa 60 

gttgaagtac ttggtggaga acgcattgaa accggttaca ctcccatcga calctccttg 120 

tccttgacac agtttctgct cagcgagttc gtgccaggtg ctgggttcgt tctcggacta 180 

gttgacatca tctggggtat ctttggtcca tctcaatggg atgcattcct ggtgcaaatt 240 

gagcagttga tcaaccagag gatcgaagag ttcgccagga accaggccat ctctaggttg 300 

gaaggattga gcaatctcta ccaaatctat gcagagagct tcagagagtg ggaagccgat 360 

cctactaacc cagctctccg cgaggaaatg cgtattcaat tcaacgacat gaacagcgcc 420 

ttgaccacag ctatcccatt gttcgcagtc cagaactacc aagttcctct cttgtccgtg 480 

lacgttcaag cagctaatct tcacctcagc gtgcttcgag acgttagcgt gtttgggcaa 540 

aggtggggat tcgatgctgc aaccatcaat agccgttaca acgaccttac taggctgatt 600 

ggaaactaca ccgaccacgc tgttcgttgg tacaacactg gcttggagcg tgtctggggt 660 

cctgattcta gagattggat tagatacaac cagttcagga gagaattgac cctcacagtt 720 

ttggacattg tgtctctctt cccgaactat gactccagaa cctaccctat ccgtacagtg 780 

tcccaactta ccagagaaat ctatactaac ccagttcttg agaacttcga cggtagcttc 840 

^S^SS^tctg cccaaggtat cgaaggctcc atcaggagcc cacacttgat ggacatcttg 900 

aacagcataa ctatctacac cgatgctcac agaggagagt attactggtc tggacaccag 960 

atcatggcct ctccagttgg attcagcggg cccgagttta cctttcctct ctatggaact 1020 

8tgee333<^g ccgctccaca acaacgtatc gttgctcaac taggtcaggg tgtctacaga 1080 

accttgtctt ccaccttgta cagaagaccc ttcaatatcg gtatcaacaa ccagcaactt 1140 

tccgttcttg acggaacaga gttcgcctat ggaacctctt ctaactlgcc atccgctgtt 1200 

tacagaaaga gcggaaccgt tgattccttg gacgaaatcc caccacagaa caacaatgtg 1260 

ccacccaggc aaggattctc ccacaggttg agccacgtgt ccatgttccg ttccggattc 1320 

agcaacagtt ccgtgagcat catcagagct cctatgttct cttggataca ccglagtgct 1380 

gagttcaaca acatcatcgc atccgatagt attactcaaa tccctgcagt gaagggaaac 1440 

tttctcttca acggttctgt catttcagga ccaggattca ctggtggaga cctcgttaga 1500 

ctcaacagca gtggaaataa catlcagaat agagggtata ttgaagttcc aattcacttc 1560 

ccatccacat ctaccagata tagagttcgt gtgaggtatg cttctgtgac ccctattcac 1620 

ctcaacgtta attggggtaa ttcatccatc ttctccaata cagttccagc tacagctacc 1680 

tccttggata atctccaatc cagcgatttc ggttactttg aaagtgccaa tgcttttaca 1740 

tcttcactcg gtaacatcgt gggtgttaga aactttagtg ggactgcagg agtgattatc 1800 

gacagattcg agttcattcc agttactgca acactcgagg ctgaatataa tctggaaaga 1860 

gcgcagaagg cggtgaatgc gctgtttacg tctacaaacc aactagggcl aaaaacaaat 1920 

gtaacggatt atcatatlga tcaagtgtcc aatttagtta cgtatttatc ggatgaattt 1980 

^S^^^SS^^S aaaagcgaga attgtccgag aaagtcaaac atgcgaagcg actcagtgat 2040 

gaacgcaatt tactccaaga ttcaaatttc aaagacatta ataggcaacc agaacgtggg 2100 

^SSSS^BS^^ gtacagggat taccatccaa ggaggggstg acgtatttaa agaaaattac 2160 

gtcacactal caggtacclt tgatgagtgc latccaacat attlgtatca aaaaatcgat 2220 
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2 I O > 26 

2 1 1 > 3534 

2 1 2 > DNA 

2 1 3 > Artificial Sequence 

2 2 O > 

2 2 3 > Description of Artificial Sequenceisynthctic 

structural gene encoding insccticidal protein 
derived firom B.tJc HD-73 
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cctgattcta gagattggat tagatacaac cagttcagga gagaattgac cctcacagtt 720 

ttggacattg tgtctctctt cccgaactat gactccagaa cctaccctat ccgtacagtg 780 

tcccaactta ccagagaaat ctatactaac ccagttctlg agaacttcga cggtagcttc 840 

c£^£&**^'8 cccaaggtat cgaaggctcc atcaggagcc cacacttgat ggacatcttg 900 

aacagcataa ctatctacac cgatgctcac agaggagagt attactggtc tggacaccag 960 

atcatggcct ctccagttgg attcagcggg cccgagttta cctttcctct ctatggaact 1020 

atgggaaacg ccgctccaca acaacgtatc gttgctcaac taggtcaggg tgtctacaga 1080 

accttgtctt ccaccttgta cagaagaccc ttcaatatcg gtatcaacaa ccagcaactt 1140 

tccgttcttg acggaacaga gttcgcctat ggaacctctt ctaacttgcc atccgctgtt 1200 

tacagaaaga gcggaaccgt tgattccttg gacgaaatcc caccacagaa caacaatgtg 1260 

ccocccaggc aaggattctc ccacaggttg agccacgtgt ccatgttccg ttccggattc 1320 

agcaacagtt ccgtgagcat catcagagct cctatgttct cttggataca ccgtagtgct 1380 

gagttcaaca acalcatcgc alccgatagt attactcaaa tccctgcagt gaa g'g gaaac 1440 

tttctcttca acggttctgt calttcagga ccaggattca ctggtggaga cctcgttaga ISOO 

ctcaacagca gtggaaataa cattcagaat agagggtata ttgaagttcc aattcacttc 1560 

ccatccacat ctaccagata tagagttcgt gtgaggtatg cttctgtgac ccctattcac 1620 

ctcaacgtta attggggtaa ttcatccatc ttctccaata cagttccagc tacagctacc 1680 

tccttggata atctccaatc cagcgatttc ggttactttg aaagtgccaa tgcttttaca 1740 

tcttcactcg gtaacatcgt gggtgttaga aactttagtg ggactgcagg agtgattatc 1800 

gacagattcg agttcattcc agttactgca acactcgagg ctgagtacaa ccttgagaga 1860 

gcccagaagg ctgtgaacgc cctctttacc tccaccaatc agcttggctt gaaaactaac 1920 

gttactgact atcacattga ccaagtgtcc aacttggtca cctaccttag cgatgagttc 1980 

tgcctcgacg agaagcgtga actctccgag aaagttaaac acgccaagcg tctcagcgac 2040 

gagaggaatc tcttgcaaga ctccaacttc aaagacatca acaggcagcc agaacgtggt 2100 

ieSS&*^S&^^ gcaccgggat caccatccaa ggaggcgacg atgtgttcaa ggagaactac 2160 

gtcaccctct ccggaacttt cgacgagtgc taccctacct acttgtacca gaagatcgat 2220 

gagtccaaac tcaaagcctt caccaggtat caacttagag gctacatcga agacagccaa 2280 

gaccttgaaa tctactcgat caggtacaat gccaagcacg agaccgtgaa tgtcccaggt 2340 

actggttccc tctggccact ttctgcccaa tctcccattg ggaagtgtgg agagcctaac 2400 

agatgcgctc cacaccttga gtggaatcct gacttggact gctcctgcag ggatggcgag 2460 

aagtgtgccc accattctca tcacttctcc ttggacaicg atgtgggatg tactgacctg 2S20 

aatgaggacc tcggagtctg ggtcatcttc aagatcaaga cccaagacgg acacgcaaga 2S80 

cttggcaacc ttgagtttct cgaagagaaa ccattggtcg gtgaagctct cgctcgtgtg 2640 

aagagagcag agaagaagtg gagggacaaa cgtgagaaac tcgaatggga aactaacatc 2700 

gtttacaagg aggccaaaga gtccgtggat gctttgttcg tgaactccca atatgatcag 2760 

ttgcaagccg acaccaacat cgccatgatc cacgccgcag acaaacgtgt gcacagcatt 2820 

cgtgaggctt acttgcctga gttgtccgtg alccctggtg tgaacgctgc catcttcgag 2880 

gaacttgagg gacgtatctt taccgcattc tccttgtacg atgccagaaa cgtcatcaag 2940 

aacggtgact tcaacaatgg cctcagctgc tggaatgtga aaggtcatgt ggacgtggag 3000 

gaacagaaca atcagcgttc cgtcclggtt gtgcctgagt gggaagctga agtgtcccaa 3060 
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gaggttagag tctgtccagg tagaggctac attctccgtg tgaccgctta caaggaggga 3120 

tacggtgagg gttgcgtgac catccacgag atcgagaaca acaccgacga gcttaagttc 3180 

tccaactgcg tcgaggaaga aatctatccc aacaacaccg ttacttgcaa cgactacact 3240 

gtgaatcagg aagagtacgg aggtgcctac actagccgta acagaggtta caacgaagct 3300 

ccttccgttc ctgctgacta tgcctccgtg tacgaggaga aatcctacac agatggcaga 3360 

cgtgagaacc cttgcgagtt caacagaggt tacagggact acacaccact tccagttggc 3420 

tatgttacca aggagcttga gtactttcct gagaccgaca aagtgtggat cgagatcggt 3480 

gaaaccgagg gaaccttcat cgtggacagc gtggagcttc tcltgatgga ggaa 3534 

< 2 1 0 > 27- 

< 2 1 I > 3531 

< 2 1 2 > DNA 

< 2 1 3 > ArtrScial Sequence 

< 2 2 0 > 

< 2 2 3 > Descriptioa of Artificial Scqiience:synthctic 

stiuctural gene encoding a fusion protein derived 
from; B.tJc. HD-1 and B.t.k. HD-73 

< 4 0 0 > 27 

atggacaaca acccaaacat caacgaatgc attccataca actgctlgog taacccagaa 60 

gttgaagtac ttggtggaga acgcattgaa accggttaca ctcccatcga catctccttg 120 

tccttgacac agtttctgct cagcgagttc gtgccaggtg ctgggttcgt tctcggacta 180 

gttgacatca tctggggtat ctttggtcca tctcaatggg atgcattcct ggtgcaaatt 240 

gagcagttga tcaaccagag gatcgaagag ttcgccagga accaggccat ctctaggttg 300 

gaaggattga gcaatctcta ccaaatctat gcagagagct tcagagagtg ggaagccgat 360 

cctactaacc cagctctccg cgaggaaatg cgtattcaat tcaacgacat gaacagcgcc 420 

ttgaccacag ctatcccati gttcgcagtc cagaactacc aagttcctct cttgtccgtg 480 

tacgttcaag cagctaatct tcacctcagc gtgcttcgag acgttagcgt gtttgggcaa 540 

aggtggggat tcgatgctgc aaccatcaat agccgttaca acgaccttac taggctgatt 600 

ggaaactaca ccgaccacgc tgttcgttgg tacaacactg gcttggagcg tgtctggggt 660 

cctgattcta gagattggat tagatacaac cagttcagga gagaattgac cctcacagtt 720 

ttggacattg tgtctctctt cccgaactat gactccagaa cctaccctat ccgtacagtg 780 

tcccaactta ccagagaaat ctatactaac ccagttcttg agaacttcga cggtagcttc 840 

cgtggttctg cccaaggtat cgaaggctcc atcaggagcc cacacttgat ggacatcttg 900 

aacagcataa ctatctacac cgatgctcac agaggagagt attactggtc tggacaccag 960 

atcatggcct ctccagttgg attcagcggg cccgagttta cctttcctct ctatggaact 1020 

atgggaaacg ccgctccaca acaacgtatc gttgctcaac taggtcaggg tgtctacaga 1080 

accttgtclt ccaccttgta cagaagaccc ttcaatatcg gtatcaacaa ccagcaactt 1140 

tccgttcttg acggaacaga gttcgcctat ggaacctctt ctaacttgcc atccgctgtt 1200 

tacagaaaga gcggaaccgt tgattccttg gacgaaatcc caccacagaa caacaatgtg 1260 

ccacccaggc aaggattctc ccacaggttg agccacgtgt ccatgttccg ttccggattc 1320 

agcaacagtt ccgtgagcat catcagagct cctatgttct catggattca icgtagtgct 1380 

gagttcaaca atatcaltcc ttcctctcaa atcacccaaa tcccattgac caagtctact 1440 

aaccttggat ctggaacttc tgtcgtgaaa ggaccaggct tcacaggagg tgatattctt 1500 

agaagaactt cicctggcca galtagcacc clcagagtta acalcaclgc accaclttct 1560 
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We claim: 

1. A heterologous gene construct comprising a structural 
coding sequence which encodes an insecticidal protein 50 
derived from B.k. HD-1, said structural coding sequence 
comprising SEQ ID NO:20. 

2. A heterologous gene construct comprising a structural 
coding sequence which encodes an insecticidal protein 
derived from BJ.k. HD-73, said structural coding sequence 
comprising SEQ ID NO:21. 65 

3. A heterologous gene construct comprising a structural 
coding sequence which encodes an insecticidal protein 



derived from B.t.k. HD-1, said structural coding sequence 
comprising SEQ ID NO:22. 

4. A heterologous gene construct comprising a structural 
coding sequence which encodes an insecticidal protein 
derived from B.t.k, HD-73, said structural coding sequence 
comprising SEQ ID NO:23. 

5. A heterologous gene construct comprising a structural 
coding sequence which encodes an insecticidal protein 
derived from B.Lk. HD-73, said structural coding sequence 
comprising SEQ ID NO:24. 
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6. A heterologous gene construct comprising a structural 
coding sequence which encodes an insecticidal protein 
derived from B.t.k. HD-73, said structural coding sequence 
comprising SEQ ID NO:25. 

7. A heterologous gene construct comprising a structural 
coding sequence which encodes an insecticidal protein 
derived from B.t.k, HD-73, said structural coding sequence 
comprising SEQ ID NO:26. 
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8. A heterologous gene construct comprising a structural 
coding sequence which encodes a fusion protein comprising 
the N-terminal 610 amino acids of a toxin protein derived 
from B.t.k. HD-1 and the C-terminal 567 amino acids of a 
toxin protein derived from B.t.k. HD-73, said structural 
coding sequence comprising SEQ ID NO:27. 



EXHIBIT E 



0 



'105 Application Claim 40 



'365 Patent Claim 5 



40. A synthetic gene 



which is derived from a 
Bacillus thuringiensis 
insecticidal protein toxin 
gene and which is more 
highly expressed in plants, 
wherein the coding 
sequence of said synthetic 
gene is modified to 
contain: 



b) fewer 
polyadenylation signal 
sequences than said 
insecticidal protein toxin 
gene 



5. A modified chimeric gene comprising 

a promoter which functions in plant cells 
operably linked to a structural coding sequence 
and a 3* non-translated region comprising a 
polyadenylation sequence which functions in 
plants to cause the addition of polyadenylate 
nucleotides to the 3' end of the RNA, 

wherein said structural coding sequence 
encodes a toxin protein derived from a Bacillus 
thuringiensis protein, wherein said structural 
coding sequence comprises a DNA sequence 
which differs from the naturally occurring DNA 
sequence encoding said Bacillus thuringiensis 
protein and comprises the following 
characteristics: 

said naturally occurring DNA sequence 
comprises a region having the following 
sequence: 



TTAATTAACCAAAGAATAGAAGAATTCGCTAGGAAC 
1 5 10 15 20 25 30 35 

and where said structural coding sequence 
comprises modifications so that at least said 
region contains at least one fewer sequence 
selected from the group consisting of plant 
polyadenylation sequences and an ATTTA 
sequence, 

and where said modifications increase the 
number of plant preferred codons in said 
structural coding sequence. 



a) a greater 
number of codons 
preferred by the intended 
plant host than said 
insecticidal protein toxin 
gene 



